[Bugs] [Bug 1807384] New: [AFR] Heal not happening after disk cleanup and self-heal of files/dirs is done to simulate disk replacement
bugzilla at redhat.com
bugzilla at redhat.com
Wed Feb 26 09:06:38 UTC 2020
https://bugzilla.redhat.com/show_bug.cgi?id=1807384
Bug ID: 1807384
Summary: [AFR] Heal not happening after disk cleanup and
self-heal of files/dirs is done to simulate disk
replacement
Product: GlusterFS
Version: mainline
Hardware: x86_64
OS: Linux
Status: NEW
Component: replicate
Keywords: Regression
Severity: medium
Assignee: bugs at gluster.org
Reporter: kiyer at redhat.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
While trying to run patch [1] which does the steps mentioned in the following
sections, it was observed that the arequal-checksums were different as shown
below:
################################################################################
Checksum of the brick on which the data is removed
################################################################################
arequal-checksum -p /mnt/vol0/testvol_replicated_brick2 -i .glusterfs -i
.landfill -i .trashcan
Entry counts
Regular files : 14
Directories : 3
Symbolic links : 0
Other : 0
Total : 17
Metadata checksums
Regular files : 3e9
Directories : 24d74c
Symbolic links : 3e9
Other : 3e9
Checksums
Regular files : c4a0e0fd92dba41dc446cc3b33287983
Directories : 300002e01
Symbolic links : 0
Other : 0
Total : e62cc5a1f3f39f
################################################################################
Checksum of the brick where data wan't removed
################################################################################
arequal-checksum -p /mnt/vol0/testvol_replicated_brick1 -i .glusterfs -i
.landfill -i .trashcan
Entry counts
Regular files : 16500
Directories : 11
Symbolic links : 0
Other : 0
Total : 16511
Metadata checksums
Regular files : 3e9
Directories : 24d74c
Symbolic links : 3e9
Other : 3e9
Checksums
Regular files : 6b72772e37d757ad53453c4aafed344c
Directories : 301002f01
Symbolic links : 0
Other : 0
Total : 38374b67993a4ce0
################################################################################
This mean that heal wasn't completing on the node where data was removed.
However when the heal was checked before checking the checksum it was showing
no entries to be healed on the bricks:
################################################################################
2020-02-25 12:15:34,416 INFO (run) root at 172.19.2.161 (cp): gluster volume heal
testvol_replicated info --xml
2020-02-25 12:15:34,416 DEBUG (_get_ssh_connection) Retrieved connection from
cache: root at 172.19.2.161
2020-02-25 12:15:34,618 INFO (_log_results) RETCODE (root at 172.19.2.161): 0
2020-02-25 12:15:34,619 DEBUG (_log_results) STDOUT (root at 172.19.2.161)...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
<healInfo>
<bricks>
<brick hostUuid="112835ce-16ed-43e1-a758-c104c78ff782">
<name>172.19.2.161:/mnt/vol0/testvol_replicated_brick0</name>
<status>Connected</status>
<numberOfEntries>0</numberOfEntries>
</brick>
<brick hostUuid="3fdae765-7a1f-4ae5-99c1-ea7b24768554">
<name>172.19.2.153:/mnt/vol0/testvol_replicated_brick1</name>
<status>Connected</status>
<numberOfEntries>0</numberOfEntries>
</brick>
<brick hostUuid="a3877a65-2963-423c-8e9f-95ceb07f907d">
<name>172.19.2.164:/mnt/vol0/testvol_replicated_brick2</name>
<status>Connected</status>
<numberOfEntries>0</numberOfEntries>
</brick>
</bricks>
</healInfo>
<opRet>0</opRet>
<opErrno>0</opErrno>
<opErrstr/>
</cliOutput>
################################################################################
Version-Release number of selected component (if applicable):
glusterfs 20200220.a0e0890
How reproducible:
2/2
Steps to Reproduce:
- Create a volume of type replica or distributed-replica
- Create directory on mount point and write files/dirs
- Create another set of files (1K files)
- While creation of files/dirs are in progress Kill one brick
- Remove the contents of the killed brick(simulating disk replacement)
- When the IO's are still in progress, restart glusterd on the nodes
where we simulated disk replacement to bring back bricks online
- Start volume heal
- Wait for IO's to complete
- Verify whether the files are self-healed
- Calculate arequals of the mount point and all the bricks
Actual results:
Arequal are different for replica volumes and aren't consistent in distributed
replicated volumes.
Expected results:
Arequals should be same in case of replicate and should be consistent in case
of distributed-replicated volumes.
Additional info:
This issue wasn't observed in gluster 6.0 builds.
Reference links:
[1] https://review.gluster.org/#/c/glusto-tests/+/20378/
[2]
https://ci.centos.org/job/gluster_glusto-patch-check/2053/artifact/glustomain.log
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list