[Gluster-users] Split brain; which file to choose for repair?

Martin Schenker martin.schenker at profitbricks.com
Tue May 3 20:20:01 UTC 2011


Hi all!

Another incident, now a real "split brain" situation:

Server pair 12 & 13, a set of files can't be repaired and throws errors. 

Is there a way to interpret the AFR code in order to select which files
should be chosen to be deleted/overwritten?!


No errors in opt-profitbricks-storage.log from pserver12; but
opt-profitbricks-storage.log from pserver13 says:

 [2011-05-03 18:14:29.343512] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-11.
[2011-05-03 18:14:29.344467] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-11'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.347376] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-16.
[2011-05-03 18:14:29.348157] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-16'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.349013] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-17.
[2011-05-03 18:14:29.349817] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-17'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.351252] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-19.
[2011-05-03 18:14:29.352043] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-19'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.353477] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-20.
[2011-05-03 18:14:29.354242] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-20'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.356343] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-23.
[2011-05-03 18:14:29.357198] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-23'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.358030] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-24.
[2011-05-03 18:14:29.358877] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-24'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.362652] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-3.
[2011-05-03 18:14:29.363431] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-3'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.364261] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-30.
[2011-05-03 18:14:29.365041] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-30'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.368924] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-36.
[2011-05-03 18:14:29.369682] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-36'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.371696] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-39.
[2011-05-03 18:14:29.372451] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-39'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2011-05-03 18:14:29.373939] I [afr-common.c:672:afr_lookup_done]
0-storage0-replicate-2: split brain detected during lookup of /pserver3-5.
[2011-05-03 18:14:29.374705] E [afr-self-heal-data.c:645:afr_sh_data_fix]
0-storage0-replicate-2: Unable to self-heal contents of '/pserver3-5'
(possible split-brain). Please delete the file from all but the preferred
subvolume.

0 root at de-dc1-c1-pserver12:/var/log/glusterfs # getfattr -R -d -e hex -m
"trusted.afr." /mnt/gluster/brick?/storage | grep -v
0x000000000000000000000000 | grep -B1 -A1 trusted
getfattr: Removing leading '/' from absolute path names

# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-3
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-30
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-17
trusted.afr.storage0-client-5=0x3f0000010000000000000000

# file: mnt/gluster/brick0/storage/pserver3-11
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-20
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-16
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-5
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-39
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-23
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-24
trusted.afr.storage0-client-5=0x3f0000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-36
trusted.afr.storage0-client-5=0x3f0000010000000000000000




130 root at de-dc1-c1-pserver13:/var/log/glusterfs # getfattr -R -d -e hex -m
"trusted.afr." /mnt/gluster/brick?/storage | grep -v
0x000000000000000000000000 | grep -B1 -A1 trusted
getfattr: Removing leading '/' from absolute path names

# file: mnt/gluster/brick0/storage/pserver3-23
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-20
trusted.afr.storage0-client-4=0xce00000a0000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-11
trusted.afr.storage0-client-4=0xd70000010000000000000000

# file: mnt/gluster/brick0/storage/pserver3-5
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-30
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-39
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-16
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-17
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-24
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-36
trusted.afr.storage0-client-4=0xd70000010000000000000000

# file: mnt/gluster/brick0/storage/pserver3-3
trusted.afr.storage0-client-4=0xd70000010000000000000000
--
# file: mnt/gluster/brick0/storage/pserver3-19
trusted.afr.storage0-client-4=0xd70000010000000000000000




More information about the Gluster-users mailing list