[Bugs] [Bug 1293265] New: md5sum of files mismatch after the self-heal is complete on the file

Mon Dec 21 09:25:22 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1293265

            Bug ID: 1293265
           Summary: md5sum of files mismatch after the self-heal is
                    complete on the file
           Product: GlusterFS
           Version: 3.7.6
         Component: replicate
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com,
                    mzywusko at redhat.com, nsathyan at redhat.com,
                    ravishankar at redhat.com, spandura at redhat.com
        Depends On: 1292379, 1293240, 1141750
            Blocks: 1154491, 1293239, 1139599

+++ This bug was initially created as a clone of Bug #1292379 +++

+++ This bug was initially created as a clone of Bug #1141750 +++

Description of problem:
==========================
In a 2 x 2 distribute-replicate volume created a file and started writing to
the file from multiple mounts. Bounced the  same brick multiple times from the
sub-volume which had the file . After the writes are complete and after
successful self-heal, checked the md5sum of the file on both source and sink.
md5sum of the file mismatches. 

How reproducible:
==================
Tried once

Steps to Reproduce:
=======================
1. Create 2 x 2 distribute-replicate volume. Start the volume.

2. From 2 client machines create 2 fuse and 2 nfs mount. 

3. From all the mount points start IO on the file:
"dd if=/dev/urandom of=./testfile1 bs=1M count=20480"

4. While IO is in progress, bring down the one brick from the sub-volume which
contains the file.

5. perform "replace-brick commit force" on the brought down brick. (self-heal
starts on the replaced brick)

6. add 2 new bricks to the volume.

7. start rebalance. 

8. while rebalance is in progress, rename the file from one of the mount point:

"for i in `seq 1 100`; do mv /mnt/1/testfile${i} /mnt/1/testfile`expr $i + 1` ;
done" 

(rename successful without any failures)

9. add 2 new bricks to the volume.

10. start rebalance. 

11. while rebalance is in progress, rename the file from one of the mount
point:

"for i in `seq 101 200`; do mv /mnt/1/testfile${i} /mnt/1/testfile`expr $i + 1`
; done" 

(rename successful without any failures)

12. add 2 new bricks to the volume.

13. start rebalance. 

14. while rebalance is in progress, rename the file from one of the mount
point:

"for i in `seq 101 200`; do mv /mnt/1/testfile${i} /mnt/1/testfile`expr $i + 1`
; done"

for i in `seq 201 400`; do mv /mnt/1/testfile${i} /mnt/1/testfile`expr $i + 1`
; done

15. Bring down the replaced brick where self-heal was in progress.

16. rename the file from one of the mount point. 

for i in `seq 401 500`; do mv /mnt/1/testfile${i} /mnt/1/testfile`expr $i + 1`
; done

17. Bring back the brick online

18. Stop dd from all the mounts. 

Actual results:
=================
After some time, self-heal was complete successfully. 

[2014-09-15 11:11:43.364608] I
[afr-self-heal-common.c:2869:afr_log_self_heal_completion_status]
0-vol1-replicate-1:  foreground data self heal  is successfully completed, 
data self heal from vol1-client-3  to sinks  vol1-client-2, with 12232425472
bytes on vol1-client-2, 12934053888 bytes on vol1-client-3,  data - Pending
matrix:  [ [ 522 522 ] [ 45970 419 ] ]  on
<gfid:a0f9dddf-9927-423f-b334-8d7dcd8ecc22>

Check the md5sum of the file on both the bricks
===============================================

Source extended attribute when the dd was stopped. 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root at rhs-client12 [Sep-15-2014-11:09:08] >getfattr -d -e hex -m .
/rhs/device1/b4/testfile501 
getfattr: Removing leading '/' from absolute path names
# file: rhs/device1/b4/testfile501
trusted.afr.vol1-client-2=0x0000b39b0000000000000000
trusted.afr.vol1-client-3=0x000001a30000000000000000
trusted.gfid=0xa0f9dddf9927423fb3348d7dcd8ecc22

Sink extended attribute when dd was stopped.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root at rhs-client11 [Sep-15-2014-11:09:08] >getfattr -d -e hex -m .
/rhs/device1/b3_replaced/testfile501 
getfattr: Removing leading '/' from absolute path names
# file: rhs/device1/b3_replaced/testfile501
trusted.afr.vol1-client-2=0x000000000000000000000000
trusted.afr.vol1-client-3=0x000000000000000000000000
trusted.gfid=0xa0f9dddf9927423fb3348d7dcd8ecc22

Source md5sum after the dd is stopped:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root at rhs-client12 [Sep-15-2014-11:09:09] >
root at rhs-client12 [Sep-15-2014-11:09:10] >md5sum /rhs/device1/b4/testfile501
a116972019a667785adaf3d50b276117  /rhs/device1/b4/testfile501

Sink md5sum after the dd is stopped :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root at rhs-client11 [Sep-15-2014-11:09:10] >md5sum
/rhs/device1/b3_replaced/testfile501
f2193f1062c6bc0db618d44c7096aa28  /rhs/device1/b3_replaced/testfile501

Source extended attributes and md5sum after self-heal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root at rhs-client12 [Sep-15-2014-11:17:43] >getfattr -d -e hex -m .
/rhs/device1/b4/testfile501 
getfattr: Removing leading '/' from absolute path names
# file: rhs/device1/b4/testfile501
trusted.afr.vol1-client-2=0x000000000000000000000000
trusted.afr.vol1-client-3=0x000000000000000000000000
trusted.gfid=0xa0f9dddf9927423fb3348d7dcd8ecc22

root at rhs-client12 [Sep-15-2014-11:17:44] >md5sum /rhs/device1/b4/testfile501
a116972019a667785adaf3d50b276117  /rhs/device1/b4/testfile501
root at rhs-client12 [Sep-15-2014-11:18:57] >

Sink extended attributes and md5sum after self-heal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root at rhs-client11 [Sep-15-2014-11:17:43] >getfattr -d -e hex -m .
/rhs/device1/b3_replaced/testfile501 
getfattr: Removing leading '/' from absolute path names
# file: rhs/device1/b3_replaced/testfile501
trusted.afr.vol1-client-2=0x000000000000000000000000
trusted.afr.vol1-client-3=0x000000000000000000000000
trusted.gfid=0xa0f9dddf9927423fb3348d7dcd8ecc22

root at rhs-client11 [Sep-15-2014-11:17:44] >md5sum
/rhs/device1/b3_replaced/testfile501
f2193f1062c6bc0db618d44c7096aa28  /rhs/device1/b3_replaced/testfile501
root at rhs-client11 [Sep-15-2014-11:19:02] >

Expected results:
===================
After self-heal the md5sum's should match. 

--- Additional comment from Pranith Kumar K on 2014-09-18 00:33:06 EDT ---

Simpler test to re-create the bug:
0) Create a replicate volume 1x2 start it and mount it.
1) Open a file 'a' from the mount and keep writing to it.
2) Bring one of the bricks down
3) rename the file '<mnt>/a' to '<mnt>/b'
4) Wait for at least one write to complete while the brick is still down.
5) Restart the brick
6) Wait until self-heal completes and stop the 'writing' from mount point.

Root cause:
When Rename happens while the brick is down after the brick comes back up,
entry self-heal is triggered on the parent directory of where the rename
happened, in this case that is <mnt>. As part of this entry self-heal 
1) file 'a' is deleted and
2) file 'b' is re-created.

0) In parallel to this, writing fd needs to be opened on the file from the
mount point.

If re-opening of the file in step-0) happens before step-1) of self-heal then
this issue is observed. Writes from mount keep going to the file that was
deleted where as the self-heal happens on the file created at step-2. So the
checksum mismatches. One more manifestation of this issue is
https://bugzilla.redhat.com/show_bug.cgi?id=1139599. Where writes from the
mount only increase the file on the 'always up' brick but the file on the other
brick is not growing. This leads to split-brain because of size mismatch but
all-zero pending changelog.

--- Additional comment from Vijay Bellur on 2015-12-18 06:49:20 EST ---

REVIEW: http://review.gluster.org/13001 (cluster/afr: Fix data loss due to race
between sh and ongoing write) posted (#1) for review on master by Krutika
Dhananjay (kdhananj at redhat.com)

--- Additional comment from Vijay Bellur on 2015-12-21 03:56:42 EST ---

REVIEW: http://review.gluster.org/13001 (cluster/afr: Fix data loss due to race
between sh and ongoing write) posted (#2) for review on master by Krutika
Dhananjay (kdhananj at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1154491
[Bug 1154491] split-brain reported on files whose change-logs are all zeros
https://bugzilla.redhat.com/show_bug.cgi?id=1292379
[Bug 1292379] md5sum of files mismatch after the self-heal is complete on
the file
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.