[Gluster-devel] glusterfs3.2.7 split brain on a server, while it's normal on another server

Pranith Kumar K pkarampu at redhat.com
Wed Jan 9 10:05:49 UTC 2013


On 01/09/2013 11:03 AM, Song wrote:
>
> Hi,
>
> We have a glusterfs clusters, version is 3.2.7. The volume info is as 
> below:
>
> Volume Name: gfs1
>
> Type: Distributed-Replicate
>
> Status: Started
>
> Number of Bricks: 94 x 3 = 282
>
> Transport-type: tcp
>
> We native mount the volume in all cluster servers. When we access the 
> file "/XMTEXT/gfs1_000/000/000/095" on one server, the error is split 
> brain.
>
> While we can access the same file on another server.
>
> At the same time, after re-mount the volume at error server, access 
> the same file is ok.
>
> The glusterfs has cached some information? This case has happened more 
> than one.
>
> The log is as following when split brain.
>
> [2013-01-07 09:57:29.554505] W 
> [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 
> 0-gfs1-replicate-5: split brain detected during lookup of 
> /XMTEXT/gfs1_000/000/000/095.
>
> [2013-01-07 09:57:29.554566] I 
> [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: 
> background  data gfid self-heal triggered. path: 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:29.555299] I 
> [afr-self-heal-common.c:1290:sh_missing_entries_create] 
> 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. 
> proceeding to metadata check
>
> [2013-01-07 09:57:29.555507] I 
> [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 
> 0-gfs1-replicate-5: split brain found, aborting selfheal of 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:29.555531] E 
> [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 
> 0-gfs1-replicate-5: background  data gfid self-heal failed on 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:35.598229] W 
> [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 
> 0-gfs1-replicate-5: split brain detected during lookup of 
> /XMTEXT/gfs1_000/000/000/095.
>
> [2013-01-07 09:57:35.598282] I 
> [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: 
> background  data gfid self-heal triggered. path: 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:35.598939] I 
> [afr-self-heal-common.c:1290:sh_missing_entries_create] 
> 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. 
> proceeding to metadata check
>
> [2013-01-07 09:57:35.599139] I 
> [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 
> 0-gfs1-replicate-5: split brain found, aborting selfheal of 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:35.599176] E 
> [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 
> 0-gfs1-replicate-5: background  data gfid self-heal failed on 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:38.192819] W 
> [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 
> 0-gfs1-replicate-5: split brain detected during lookup of 
> /XMTEXT/gfs1_000/000/000/095.
>
> [2013-01-07 09:57:38.192875] I 
> [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: 
> background  data gfid self-heal triggered. path: 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:38.193486] I 
> [afr-self-heal-common.c:1290:sh_missing_entries_create] 
> 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. 
> proceeding to metadata check
>
> [2013-01-07 09:57:38.193708] I 
> [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 
> 0-gfs1-replicate-5: split brain found, aborting selfheal of 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:38.193731] E 
> [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 
> 0-gfs1-replicate-5: background  data gfid self-heal failed on 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 09:57:38.193937] W [afr-open.c:168:afr_open] 
> 0-gfs1-replicate-5: failed to open as split brain seen, returning EIO
>
> [2013-01-07 09:57:38.194033] W [fuse-bridge.c:693:fuse_fd_cbk] 
> 0-glusterfs-fuse: 3162527: OPEN() /XMTEXT/gfs1_000/000/000/095 => -1 
> (Input/output error)
>
> [2013-01-07 10:08:12.569821] W 
> [afr-common.c:931:afr_detect_self_heal_by_lookup_status] 
> 0-gfs1-replicate-5: split brain detected during lookup of 
> /XMTEXT/gfs1_000/000/000/095.
>
> [2013-01-07 10:08:12.569891] I 
> [afr-common.c:1039:afr_launch_self_heal] 0-gfs1-replicate-5: 
> background  data gfid self-heal triggered. path: 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 10:08:12.571538] I 
> [afr-self-heal-common.c:1290:sh_missing_entries_create] 
> 0-gfs1-replicate-5: no missing files - /XMTEXT/gfs1_000/000/000/095. 
> proceeding to metadata check
>
> [2013-01-07 10:08:12.572684] I 
> [afr-self-heal-common.c:1050:afr_sh_missing_entries_done] 
> 0-gfs1-replicate-5: split brain found, aborting selfheal of 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 10:08:12.572732] E 
> [afr-self-heal-common.c:2190:afr_self_heal_completion_cbk] 
> 0-gfs1-replicate-5: background  data gfid self-heal failed on 
> /XMTEXT/gfs1_000/000/000/095
>
> [2013-01-07 10:08:12.580006] W [afr-open.c:168:afr_open] 
> 0-gfs1-replicate-5: failed to open as split brain seen, returning EIO
>
> [2013-01-07 10:08:12.580103] W [fuse-bridge.c:693:fuse_fd_cbk] 
> 0-glusterfs-fuse: 3164490: OPEN() /XMTEXT/gfs1_000/000/000/095 => -1 
> (Input/output error)
>
> Thanks!
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
Song,
       It seems like the file is in gfid-split-brain. To confirm, could 
you provide the output of following command from backends.
getfattr -d -m . -e hex <file-in-split-brain>

Pranith.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130109/167405b5/attachment-0001.html>


More information about the Gluster-devel mailing list