[Gluster-users] frequent split-brain detected, aborting selfheal; background meta-data self-heal failed

Tomasz Chmielewski mangoo at wpkg.org
Tue Jan 8 17:33:35 UTC 2013


Hi,

I'm seeing rather frequent (several times per minute) log entries like:

[2013-01-08 16:45:03.399791] I [afr-common.c:1038:afr_launch_self_heal] 0-shared-replicate-0: background  meta-data self-heal triggered. path: /lfd/techstudiolfc/pub
[2013-01-08 16:45:03.400224] I [afr-self-heal-common.c:705:afr_mark_sources] 0-shared-replicate-0: split-brain possible, no source detected
[2013-01-08 16:45:03.400253] E [afr-self-heal-metadata.c:512:afr_sh_metadata_fix] 0-shared-replicate-0: Unable to self-heal permissions/ownership of '/lfd/techstudiolfc/pub' (possible split-brain). Please fix the file on all backend volumes
[2013-01-08 16:45:03.400417] I [afr-self-heal-metadata.c:81:afr_sh_metadata_done] 0-shared-replicate-0: split-brain detected, aborting selfheal of /lfd/techstudiolfc/pub
[2013-01-08 16:45:03.400453] E [afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 0-shared-replicate-0: background  meta-data self-heal failed on /lfd/techstudiolfc/pub


However, when checking the affected directory - the permissions/ownerships seem to be identical on both servers:

[root at ca1.sg1 /]# ls -ld /data/gluster/lfd/techstudiolfc/pub

drwxr-xr-x 2 userftp userftp 4096 Jun  6  2012 /data/gluster/lfd/techstudiolfc/pub



[root at ca1.sg1 /]# attr -l /data/gluster/lfd/techstudiolfc/pub

Attribute "gfid" has a 16 byte value for /data/gluster/lfd/techstudiolfc/pub

Attribute "afr.shared-client-0" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub

Attribute "afr.shared-client-1" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub





[root at ca2.sg1 /]# ls -ld /data/gluster/lfd/techstudiolfc/pub

drwxr-xr-x 2 userftp userftp 4096 Jun  6  2012 /data/gluster/lfd/techstudiolfc/pub



[root at ca2.sg1 /]# attr -l /data/gluster/lfd/techstudiolfc/pub

Attribute "gfid" has a 16 byte value for /data/gluster/lfd/techstudiolfc/pub

Attribute "afr.shared-client-0" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub

Attribute "afr.shared-client-1" has a 12 byte value for /data/gluster/lfd/techstudiolfc/pub


What could be the problem?

I'm using glusterfs 3.2.6 on Debian Squeeze, and seeing the very same problem on different servers.
It only seem to affect directories.

-- 
Tomasz Chmielewski
http://wpkg.org



More information about the Gluster-users mailing list