[Gluster-devel] split brain

Jeff Darcy jdarcy at redhat.com
Thu Aug 16 13:58:10 UTC 2012


On 08/16/2012 04:55 AM, Emmanuel Dreyfus wrote:
> On all vricks, .glusterfs/3e/6b/3e6b026a-b9ed-4845-a5d1-6eb06412b3ca
> iis a symlink to directory 
> .glusterfs/4b/34/4b34a8a2-bff2-4684-b005-a36b069914ab/arch
>
> I a m bit surprised to see a link to a subdir of a .glusterfs hash.
> Is it something that makes sense? Or is it again a link(2) that should
> be remplaced by linkat(2) ?

It's part of the GFID-based back end that's new in 3.3.  Amar would be the
expert, but he's on leave right now.  Avati could probably also provide decent
answers.  I admit that I don't understand all of the nuances well enough to do so.

> Here are the xattr for the directory:
> 
> gfs33-client-0
> trusted.glusterfs.dht   00000001000000007fffffffffffffff
> trusted.afr.gfs33-client-1      000000000000000200000000
> trusted.afr.gfs33-client-0      000000000000000000000000
> trusted.gfid    3e6b026ab9ed4845a5d16eb06412b3ca
> 
> gfs33-client-1
> trusted.glusterfs.dht   00000001000000007fffffffffffffff
> trusted.afr.gfs33-client-1      000000000000000000000000
> trusted.afr.gfs33-client-0      000000000000000100000000
> trusted.gfid    3e6b026ab9ed4845a5d16eb06412b3ca
> 
> gfs33-client-2
> trusted.glusterfs.dht   0000000100000000000000007ffffffe
> trusted.afr.gfs33-client-3      000000000000000000000000
> trusted.afr.gfs33-client-2      000000000000000000000000
> trusted.gfid    3e6b026ab9ed4845a5d16eb06412b3ca
> 
> gfs33-client-3
> trusted.afr.gfs33-client-3      00000000000000000000000000
> trusted.afr.gfs33-client-2      00000000000000000000000000
> trusted.glusterfs.dht   0000000100000000000000007ffffffe
> trusted.gfid    3e6b026ab9ed4845a5d16eb06412b3ca

OK, here's something I'm much more comfortable with.  Note how this differs
from what you presented earlier, where the non-zero values were on client-0
pointing to client-1 and client-3 pointing to client-2.  Now we still have
client-0 pointing to client-1, but also client-1 pointing to client-0.  That's
a true split brain; operations seem to have completed on each node that didn't
complete on the other, so we don't know which values should take precedence.
The way I'd fix it would be to clear (not remove) one of the non-zero
trusted.afr xattrs, and let self-heal do the rest.

> I understand pending are the trusted.afr from the bricks,
> but what do they represent, by the way?

These two posts explain it about as well as I'm able:

http://hekafs.org/index.php/2011/04/glusterfs-extended-attributes/
http://hekafs.org/index.php/2012/03/glusterfs-algorithms-replication-present/



-- 

ObSig: if you use "ask" as a noun I will ignore you for a week.




More information about the Gluster-devel mailing list