[Bugs] [Bug 1236050] Disperse volume: fuse mount hung after self healing

Wed Aug 5 08:09:17 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1236050

--- Comment #2 from Backer <mdfakkeer at gmail.com> ---
I have tested the 3.7.3 as well as 3.7.2 nightly build(
glusterfs-3.7.2-20150726.b639cb9.tar.gz) for the I/O error and handout issue. I
found that 3.7.3 has the data corruption issue which is not present is 3.7.2
nightly build( glusterfs-3.7.2-20150707.36f24f5.tar.gz). Data has been
corrupted after replacing the failed drive and running the self heal. Even we
find the data corruption after the recovery of node failure ,When unavailable
data chunks has been copied by proactive self heal daemon. You can reproduce
the bug through the following steps

Steps to reproduce:
1. create a 3x(4+2) disperse volume across nodes
2. FUSE mount on the client and start creating files/directories with mkdir and
rsync/dd
3. Now, bring down 2 of the nodes(node 5 & 6)
4. write some files(eg filenew1, filenew2). The files will be available only on
4 nodes( node 1,2,3 & 4 )
5. calculate the md5sum of filenew1 and filenew2
6. Now bring up the failed/down 2 nodes( node 5 & 6)
6. Pro active Self healing will create unavailable data chunks on 2 nodes (node
5 & 6).
7. Once finish the self healing, bring down another two nodes (node 1 & 2)
8. Now try to get the mdsum of same recovered file, there will be a mismatch in
md5sum value.

But this bug is not available in 3.7.2 nightly build
(glusterfs-3.7.2-20150707.36f24f5.tar.gz)

Also i would like to know, why the proactive self healing is not happening
after replacing the failed drives. I have to manually run the volume heal
command for healing the unavailable files.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=iLSNWjnSbe&a=cc_unsubscribe