[Bugs] [Bug 1154491] split-brain reported on files whose change-logs are all zeros

bugzilla at redhat.com bugzilla at redhat.com
Mon Oct 20 04:06:45 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1154491



--- Comment #2 from Pranith Kumar K <pkarampu at redhat.com> ---
RCA from https://bugzilla.redhat.com/show_bug.cgi?id=1141750#c3 which has same
RCA but different manifestation:

Simpler test to re-create the bug:
0) Create a replicate volume 1x2 start it and mount it.
1) Open a file 'a' from the mount and keep writing to it.
2) Bring one of the bricks down
3) rename the file '<mnt>/a' to '<mnt>/b'
4) Wait for at least one write to complete while the brick is still down.
5) Restart the brick
6) Wait until self-heal completes and stop the 'writing' from mount point.

Root cause:
When Rename happens while the brick is down after the brick comes back up,
entry self-heal is triggered on the parent directory of where the rename
happened, in this case that is <mnt>. As part of this entry self-heal 
1) file 'a' is deleted and
2) file 'b' is re-created.

0) In parallel to this, writing fd needs to be opened on the file from the
mount point.

If re-opening of the file in step-0) happens before step-1) of self-heal then
this issue is observed. Writes from mount keep going to the file that was
deleted where as the self-heal happens on the file created at step-2. So the
checksum mismatches. One more manifestation of this issue is
https://bugzilla.redhat.com/show_bug.cgi?id=1139599. Where writes from the
mount only increase the file on the 'always up' brick but the file on the other
brick is not growing. This leads to split-brain because of size mismatch but
all-zero pending changelog.

It is a day-1 bug in link-self-heal of entry-self-heal. Bug exists from 2012.
This is the bug in implementation of the following commit:

commit 1936e29c3ac3d6466d391545d761ad8e60ae2e03
Author: Pranith Kumar K <pranithk at gluster.com>
Date:   Wed Feb 29 16:31:18 2012 +0530

    cluster/afr: Hardlink Self-heal

    Change-Id: Iea0b38011edbc7fc9d75a95d1775139a5234e514
    BUG: 765391
    Signed-off-by: Pranith Kumar K <pranithk at gluster.com>
    Reviewed-on: http://review.gluster.com/2841
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Amar Tumballi <amarts at redhat.com>
    Reviewed-by: Vijay Bellur <vijay at gluster.com>

Very good testing shwetha!

Thanks a lot for the help in re-creating and condensing the test case kritika!

Pranith

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=rpB9udVAo8&a=cc_unsubscribe


More information about the Bugs mailing list