[Gluster-users] split-brain on glusterfs running with quorum on server and client

Ramesh Natarajan ramesh25 at gmail.com
Sat Sep 6 00:35:49 UTC 2014


Thanks Jeff for the detailed explanation. You mentioned delayed changelog may have prevented this issue.  Can you please tell me how to enable it?


Thanks
Ramesh

On Sep 5, 2014, at 6:23 PM, Jeff Darcy <jdarcy at redhat.com> wrote:

>> I have a replicate glusterfs setup on 3 Bricks ( replicate = 3 ). I have
>> client and server quorum turned on. I rebooted one of the 3 bricks. When it
>> came back up, the client started throwing error messages that one of the
>> files went into split brain.
> 
> This is a good example of how split brain can happen even with all kinds of
> quorum enabled.  Let's look at those xattrs.  BTW, thank you for a very
> nicely detailed bug report which includes those.
> 
>> BRICK1
>> ========
>> [root at ip-172-31-38-189 ~]# getfattr -d -m . -e hex
>> /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>> data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
>> trusted.afr.PL2-client-0=0x000000000000000000000000
>> trusted.afr.PL2-client-1=0x000000010000000000000000
>> trusted.afr.PL2-client-2=0x000000010000000000000000
>> trusted.gfid=0xea950263977e46bf89a0ef631ca139c2
>> 
>> BRICK 2
>> =======
>> [root at ip-172-31-16-220 ~]# getfattr -d -m . -e hex
>> /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>> data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
>> trusted.afr.PL2-client-0=0x00000d460000000000000000
>> trusted.afr.PL2-client-1=0x000000000000000000000000
>> trusted.afr.PL2-client-2=0x000000000000000000000000
>> trusted.gfid=0xea950263977e46bf89a0ef631ca139c2
> 
>> BRICK 3
>> =========
>> [root at ip-172-31-12-218 ~]# getfattr -d -m . -e hex
>> /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>> data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
>> trusted.afr.PL2-client-0=0x00000d460000000000000000
>> trusted.afr.PL2-client-1=0x000000000000000000000000
>> trusted.afr.PL2-client-2=0x000000000000000000000000
>> trusted.gfid=0xea950263977e46bf89a0ef631ca139c2
> 
> Here, we see that brick 1 shows a single pending operation for the other
> two, while they show 0xd46 (3398) pending operations for brick 1.
> Here's how this can happen.
> 
> (1) There is exactly one pending operation.
> 
> (2) Brick1 completes the write first, and says so.
> 
> (3) Client sends messages to all three, saying to decrement brick1's
> count.
> 
> (4) All three bricks receive and process that message.
> 
> (5) Brick1 fails.
> 
> (6) Brick2 and brick3 complete the write, and say so.
> 
> (7) Client tells all bricks to decrement remaining counts.
> 
> (8) Brick2 and brick3 receive and process that message.
> 
> (9) Brick1 is dead, so its counts for brick2/3 stay at one.
> 
> (10) Brick2 and brick3 have quorum, with all-zero pending counters.
> 
> (11) Client sends 0xd46 more writes to brick2 and brick3.
> 
> Note that at no point did we lose quorum. Note also the tight timing
> required.  If brick1 had failed an instant earlier, it would not have
> decremented its own counter.  If it had failed an instant later, it
> would have decremented brick2's and brick3's as well.  If brick1 had not
> finished first, we'd be in yet another scenario.  If delayed changelog
> had been operative, the messages at (3) and (7) would have been combined
> to leave us in yet another scenario.  As far as I can tell, we would
> have been able to resolve the conflict in all those cases.
> 
> *** Key point: quorum enforcement does not totally eliminate split
> brain.  It only makes the frequency a few orders of magnitude lower. ***
> 
> So, is there any way to prevent this completely?  Some AFR enhancements,
> such as the oft-promised "outcast" feature[1], might have helped.
> NSR[2] is immune to this particular problem.  "Policy based split brain
> resolution"[3] might have resolved it automatically instead of merely
> flagging it.  Unfortunately, those are all in the future.  For now, I'd
> say the best approach is to resolve the conflict manually and try to
> move on.  Unless there's more going on than meets the eye, recurrence
> should be very unlikely.
> 
> [1] http://www.gluster.org/community/documentation/index.php/Features/outcast
> 
> [2] http://www.gluster.org/community/documentation/index.php/Features/new-style-replication
> 
> [3] http://www.gluster.org/community/documentation/index.php/Features/pbspbr


More information about the Gluster-users mailing list