[Bugs] [Bug 1142601] files with open fd's getting into split-brain when bricks goes offline and comes back online

bugzilla at redhat.com bugzilla at redhat.com
Mon Sep 29 12:46:59 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1142601



--- Comment #8 from Anand Avati <aavati at redhat.com> ---
COMMIT: http://review.gluster.org/8755 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 94045e4ae779b1bde54ad1dd0ed87981a6872125
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Wed Sep 17 11:33:23 2014 +0530

    cluster/afr: Launch self-heal only when all the brick status is known

    Problem:
    File goes into split-brain because of wrong erasing of xattrs.

    RCA:
    The issue happens because index self-heal is triggered even before all the
    bricks are up. So what ends up happening while erasing the xattrs is,
xattrs
    are erased only on the sink brick for the brick that it thinks is up
leading to
    split-brain

    Example:
    lets say the xattrs before heal started are:
    brick 2:
    trusted.afr.vol1-client-2=0x000000020000000000000000
    trusted.afr.vol1-client-3=0x000000020000000000000000

    brick 3:
    trusted.afr.vol1-client-2=0x000010040000000000000000
    trusted.afr.vol1-client-3=0x000000000000000000000000

    if only brick-2 came up at the time of triggering the self-heal only
    'trusted.afr.vol1-client-2' is erased leading to the following xattrs:

    brick 2:
    trusted.afr.vol1-client-2=0x000000000000000000000000
    trusted.afr.vol1-client-3=0x000000020000000000000000

    brick 3:
    trusted.afr.vol1-client-2=0x000010040000000000000000
    trusted.afr.vol1-client-3=0x000000000000000000000000

    So the file goes into split-brain.

    Change-Id: I1185713c688e0f41fd32bf2a5953c505d17a3173
    BUG: 1142601
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/8755
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=llta0S9y57&a=cc_unsubscribe


More information about the Bugs mailing list