[Gluster-devel] split brain: how should it be cured?

Emmanuel Dreyfus manu at netbsd.org
Mon Jun 18 12:19:04 UTC 2012


Hi

I get this split brain:

$ ls -l /pfs/manu/netbsd/usr/src/tools/mktemp/Makefile 
-rw-r--r--  1 manu  manu  165 Dec  8  2002 /pfs/manu/netbsd/usr/src/tools/mktemp/Makefile
$ head -1  /pfs/manu/netbsd/usr/src/tools/mktemp/Makefile
head: /pfs/manu/netbsd/usr/src/tools/mktemp/Makefile: Input/output error

Client log is at the end of the message.

On brick1:
trusted.gfid               6d 6c 04 a5 a8 bb 40 09 a4 a4 76 5e 83 28 63 6e
trusted.afr.pfs-client-0   00 00 00 00 00 00 00 00 00 00 00 00
trusted.afr.pfs-client-1   00 00 00 00 00 00 00 00 00 00 00 00

On brick2:
trusted.gfid               6b db b7 73 cc e7 46 a8 9d fc 96 40 2c 6a fe e8
trusted.afr.pfs-client-0   00 00 00 00 00 00 00 00 00 00 00 00
trusted.afr.pfs-client-1   00 00 00 00 00 00 00 01 00 00 00 00

Since the split brain bit is in brick2 I remove the file there. If
I run ls -l on the client, the file is re-created, but it still has the
split brain flag in trusted.afr.pfs-client-1


Client log when attempting to open the file for reading:

[2012-06-18 14:02:42.697447] W [afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-pfs-replicate-0: split brain detected during lookup of /manu/netbsd/usr/src/tools/mktemp/Makefile.
[2012-06-18 14:02:42.697699] I [afr-common.c:1340:afr_launch_self_heal] 0-pfs-replicate-0: background  meta-data data gfid self-heal triggered. path: /manu/netbsd/usr/src/tools/mktemp/Makefile, reason: lookup detected pending operations
[2012-06-18 14:02:42.698958] I [afr-self-heal-common.c:1197:afr_sh_missing_entry_call_impunge_recreate] 0-pfs-replicate-0: no missing files - /manu/netbsd/usr/src/tools/mktemp/Makefile. proceeding to metadata check
[2012-06-18 14:02:42.699622] I [afr-self-heal-common.c:1002:afr_sh_missing_entries_done] 0-pfs-replicate-0: split brain found, aborting selfheal of /manu/netbsd/usr/src/tools/mktemp/Makefile
[2012-06-18 14:02:42.699919] E [afr-self-heal-common.c:2158:afr_self_heal_completion_cbk] 0-XXX: calling afr_set_split_brain
[2012-06-18 14:02:42.700114] E [afr-self-heal-common.c:2167:afr_self_heal_completion_cbk] 0-pfs-replicate-0: background  meta-data data gfid self-heal failed on /manu/netbsd/usr/src/tools/mktemp/Makefile
[2012-06-18 14:02:42.700720] W [afr-open.c:213:afr_open] 0-pfs-replicate-0: failed to open as split brain seen, returning EIO
[2012-06-18 14:02:42.701066] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 461378: OPEN() /manu/netbsd/usr/src/tools/mktemp/Makefile => -1 (Input/output error)


Client log when doing ls -l on the file after it was removed from brick2:

[2012-06-18 14:15:14.596053] I [afr-common.c:1215:afr_detect_self_heal_by_lookup_status] 0-pfs-replicate-0: entries are missing in lookup of /manu/netbsd/usr/src/tools/mktemp/Makefile.
[2012-06-18 14:15:14.596357] I [afr-common.c:1340:afr_launch_self_heal] 0-pfs-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: /manu/netbsd/usr/src/tools/mktemp/Makefile, reason: lookup detected pending operations
[2012-06-18 14:15:14.598599] E [afr-self-heal-common.c:1095:afr_sh_common_lookup_resp_handler] 0-pfs-replicate-0: path /manu/netbsd/usr/src/tools/mktemp/Makefile on subvolume pfs-client-0 => -1 (No such file or directory)
[2012-06-18 14:15:14.600608] I [afr-self-heal-common.c:1002:afr_sh_missing_entries_done] 0-pfs-replicate-0: split brain found, aborting selfheal of /manu/netbsd/usr/src/tools/mktemp/Makefile
[2012-06-18 14:15:14.600816] E [afr-self-heal-common.c:2158:afr_self_heal_completion_cbk] 0-XXX: calling afr_set_split_brain
[2012-06-18 14:15:14.601012] E [afr-self-heal-common.c:2167:afr_self_heal_completion_cbk] 0-pfs-replicate-0: background  meta-data data entry missing-entry gfid self-heal failed on /manu/netbsd/usr/src/tools/mktemp/Makefile

NB: The XXX log is an addition I made while trying to igure what is
going on.

-- 
Emmanuel Dreyfus
manu at netbsd.org




More information about the Gluster-devel mailing list