[Gluster-devel] split brain
Emmanuel Dreyfus
manu at netbsd.org
Wed Aug 15 15:27:28 UTC 2012
After I added bricks, I have one server that is always busy trying to heal the
volume, and it gets a lot of split brain errors. glustershd.log says:
[2012-08-15 16:54:30.562148] I [afr-common.c:1340:afr_launch_self_heal]
0-gfs33-replicate-0: background meta-data self-heal triggered. path:
<gfid:29d170bb-6391-40ed-b4c6-27d8caa72a64>, reason: lookup detected
pending operations
[2012-08-15 16:54:30.693520] E [afr-self-heal-metadata.c:481:
afr_sh_metadata_fix] 0-gfs33-replicate-0: Unable to self-heal
permissions/ownership of '<gfid:29d170bb-6391-40ed-b4c6-27d8caa72a64>'
(possible split-brain). Please fix the file on all backend volumes
[2012-08-15 16:54:30.694065] I [afr-self-heal-metadata.c:56:
afr_sh_metadata_done] 0-gfs33-replicate-0: split-brain detected,
aborting selfheal of <gfid:29d170bb-6391-40ed-b4c6-27d8caa72a64>
Is there a simplier way of retreiving of offending file that looking up
.glusterfs/29/d1/29d170bb-6391-40ed-b4c6-27d8caa72a64, get the inode, and
search it in the entiere brick?
Here is what I find:
# ls -l /export/*/.glusterfs/29/d1/29d170bb-6391-40ed-b4c6-27d8caa72a64
lrwxrwxrwx 1 root wheel 55 Aug 10 18:47
/export/wd3a/.glusterfs/29/d1/29d170bb-6391-40ed-b4c6-27d8caa72a64 ->
../../2e/23/2e231351-fc16-46ef-b38b-1d5f10ffd5d5/libbfd
# ls -dil
/export/*/.glusterfs/2e/23/2e231351-fc16-46ef-b38b-1d5f10ffd5d5/libbfd
50951712 drwxr-xr-x 4 manu manu 2560 Aug 13 17:40
/export/wd3a/.glusterfs/2e/23/2e231351-fc16-46ef-b38b-1d5f10ffd5d5/libbfd
# find /export/wd3a/ -inum 50951712 -exec ls -ldi {} \;
50951712 drwxr-xr-x 4 manu manu 2560 Aug 13 17:40
/export/wd3a/manu/netbsd/usr/src/gnu/lib/libbfd
Attributes:
trusted.glusterfs.dht 00 00 00 01 00 00 00 00 7f ff ff ff ff ff ff ff
trusted.afr.gfs33-client-1 00 00 00 00 00 00 00 02 00 00 00 00
trusted.afr.gfs33-client-0 00 00 00 00 00 00 00 00 00 00 00 00
trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
On the other bricks:
trusted.glusterfs.dht 00 00 00 01 00 00 00 00 00 00 00 00 7f ff ff fe
trusted.afr.gfs33-client-2 00 00 00 00 00 00 00 00 00 00 00 00
trusted.afr.gfs33-client-3 00 00 00 00 00 00 00 00 00 00 00 00
trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
trusted.glusterfs.dht 00 00 00 01 00 00 00 00 7f ff ff ff ff ff ff ff
trusted.afr.gfs33-client-1 00 00 00 00 00 00 00 00 00 00 00 00
trusted.afr.gfs33-client-3 00 00 00 00 00 00 00 00 00 00 00 00
trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
trusted.glusterfs.dht 00 00 00 01 00 00 00 00 00 00 00 00 7f ff ff fe
trusted.afr.gfs33-client-2 00 00 00 00 00 00 00 01 00 00 00 00
trusted.afr.gfs33-client-3 00 00 00 00 00 00 00 00 00 00 00 00
trusted.gfid 29 d1 70 bb 63 91 40 ed b4 c6 27 d8 ca a7 2a 64
I tried to understand the code here, It is reading trusted.afr.gfs33-client-*
and it builds a matrix, which looks like this:
pending_matrix: [ 0 1 ]
pending_matrix: [ 2 0 ]
Then afr_sh_wise_nodes_conflict() decides that nsources = -1.
Is there some documentation explaining how it works? Someone call tell me why
it decides it is split brain?
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org
More information about the Gluster-devel
mailing list