[Gluster-devel] split brain

Jeff Darcy jdarcy at redhat.com
Wed Aug 15 17:30:49 UTC 2012


On 08/15/2012 11:27 AM, Emmanuel Dreyfus wrote:
> Attributes:
> trusted.glusterfs.dht         00 00 00 01 00 00 00 00 7f ff ff ff ff ff ff ff
> trusted.afr.gfs33-client-1  00 00 00 00 00 00 00 02 00 00 00 00
> trusted.afr.gfs33-client-0  00 00 00 00 00 00 00 00 00 00 00 00  
> trusted.gfid                   29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
> 
> On the other bricks:
> trusted.glusterfs.dht         00 00 00 01 00 00 00 00 00 00 00 00 7f ff ff fe
> trusted.afr.gfs33-client-2   00 00 00 00 00 00 00 00 00 00 00 00
> trusted.afr.gfs33-client-3  00 00 00 00 00 00 00 00 00 00 00 00  
> trusted.gfid                   29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
> 
> trusted.glusterfs.dht         00 00 00 01 00 00 00 00 7f ff ff ff ff ff ff ff
> trusted.afr.gfs33-client-1  00 00 00 00 00 00 00 00 00 00 00 00
> trusted.afr.gfs33-client-3  00 00 00 00 00 00 00 00 00 00 00 00  
> trusted.gfid                   29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
> 
> trusted.glusterfs.dht         00 00 00 01 00 00 00 00 00 00 00 00 7f ff ff fe
> trusted.afr.gfs33-client-2   00 00 00 00 00 00 00 01 00 00 00 00 
> trusted.afr.gfs33-client-3  00 00 00 00 00 00 00 00 00 00 00 00  
> trusted.gfid                   29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
> 
> I tried to understand the code here, It is reading trusted.afr.gfs33-client-*
> and it builds a matrix, which looks like this:
> pending_matrix: [ 0 1 ]
> pending_matrix: [ 2 0 ]
> 
> Then afr_sh_wise_nodes_conflict() decides that nsources = -1. 
> 
> Is there some documentation explaining how it works? Someone call tell me why
> it decides it is split brain?

I really hope the above contains a typo or copy/paste error, because if it
doesn't then ICK.  Without seeing the volfile I have to guess a little, but it
looks as though the first and third bricks above should be client-0 and
client-1 (check the matching values of trusted.glusterfs.dht) while the second
and fourth should be client-2 and client-3.  In the first place, it's odd that
the file even exists in both replica sets.  Is one a linkfile?  In any case, I
think the second and fourth bricks shown above (client-2 and client-3) are
irrelevant.

The next anomaly is the 2 in the pending matrix.  Its position indicates that
it's the second volume in the AFR definition accusing the first, and the first
must be client-1 based on the xattr name, so your volume definition must be
backwards - "subvolumes client-1 client-0" in the volfile.  That's how we get
to [0 0][2 0].  Where does the counter-accusation come from?  One clue might be
that client-1 (the third brick shown above) has xattrs for itself and
*client-3*.  Because it's missing an xattr for client-0, it's considered
ignorant and therefore we bump up other bricks' pending-operation counts for
it.  However, because of the reversed brick order that should be client-0
(second row) accusing client-1 (first column) getting us to [0 0][3 0] and
that's fully resolvable.  In fact I tried this xattr configuration, in both
directions, on a simple two-brick AFR volume myself, and it healed correctly
both times.

The only thing I can think of is that there's some further confusion or
inconsistency in how your volumes are defined, so that either the handling of
ignorant nodes is being done the wrong way or the pending-operation count from
the fourth brick shown above is being brought in even though it should be
irrelevant.  If I were you I'd double check that the volfiles look the same
everywhere, that the same brick names refer to the same physical locations
everywhere (includes checking /etc/hosts or DNS for inconsistencies), and that
the xattr values really are as reported above.  I don't think this combination
of conditions can occur without there being some kind of inconsistency there.





More information about the Gluster-devel mailing list