[Gluster-devel] glusterfs3.2.7 split brain on a server, while it's normal on another server
Song
gluster at 163.com
Thu Jan 10 01:29:11 UTC 2013
Pranith, thank you very much for your reply.
The xattrs of file-in-split-brain on three disks are same. I have confirm it
when I find this error.
[root at bj-nx-cip-w86 000]# getfattr -d -m . -e hex 095
# file: 095
trusted.afr.gfs1-client-15=0x000000000000000000000000
trusted.afr.gfs1-client-16=0x000000000000000000000000
trusted.afr.gfs1-client-17=0x000000000000000000000000
trusted.gfid=0x5ca8d51e5ea24405a8f5710b9aba08cc
[root at bj-nx-cip-w76 000]# getfattr -d -m . -e hex 095
# file: 095
trusted.afr.gfs1-client-15=0x000000000000000000000000
trusted.afr.gfs1-client-16=0x000000000000000000000000
trusted.afr.gfs1-client-17=0x000000000000000000000000
trusted.gfid=0x5ca8d51e5ea24405a8f5710b9aba08cc
[root at bj-nx-cip-w66 000]# getfattr -d -m . -e hex 095
# file: 095
trusted.afr.gfs1-client-15=0x000000000000000000000000
trusted.afr.gfs1-client-16=0x000000000000000000000000
trusted.afr.gfs1-client-17=0x000000000000000000000000
trusted.gfid=0x5ca8d51e5ea24405a8f5710b9aba08cc
I think the glusterfs client maybe cache some information. Because I umount
it, then mount it, the error is not happened.
From: Pranith Kumar K [mailto:pkarampu at redhat.com]
Sent: Wednesday, January 09, 2013 6:06 PM
To: Song
Cc: gluster-devel at nongnu.org
Subject: Re: [Gluster-devel] glusterfs3.2.7 split brain on a server, while
it's normal on another server
On 01/09/2013 11:03 AM, Song wrote:
Hi,
We have a glusterfs clusters, version is 3.2.7. The volume info is as below:
Volume Name: gfs1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 94 x 3 = 282
Transport-type: tcp
We native mount the volume in all cluster servers. When we access the file
“/XMTEXT/gfs1_000/000/000/095” on one server, the error is split brain.
While we can access the same file on another server.
At the same time, after re-mount the volume at error server, access the same
file is ok.
The glusterfs has cached some information? This case has happened more than
one.
The log is as following when split brain.
[2013-01-07 09:57:29.554505] W
[afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5:
split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.
[2013-01-07 09:57:29.554566] I [afr-common.c:1039:afr_launch_self_heal]
0-gfs1-replicate-5: background data gfid self-heal triggered. path:
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:29.555299] I
[afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5:
no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata
check
[2013-01-07 09:57:29.555507] I
[afr-self-heal-common.c:1050:afr_sh_missing_entries_done]
0-gfs1-replicate-5: split brain found, aborting selfheal of
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:29.555531] E
[afr-self-heal-common.c:2190:afr_self_heal_completion_cbk]
0-gfs1-replicate-5: background data gfid self-heal failed on
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:35.598229] W
[afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5:
split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.
[2013-01-07 09:57:35.598282] I [afr-common.c:1039:afr_launch_self_heal]
0-gfs1-replicate-5: background data gfid self-heal triggered. path:
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:35.598939] I
[afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5:
no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata
check
[2013-01-07 09:57:35.599139] I
[afr-self-heal-common.c:1050:afr_sh_missing_entries_done]
0-gfs1-replicate-5: split brain found, aborting selfheal of
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:35.599176] E
[afr-self-heal-common.c:2190:afr_self_heal_completion_cbk]
0-gfs1-replicate-5: background data gfid self-heal failed on
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:38.192819] W
[afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5:
split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.
[2013-01-07 09:57:38.192875] I [afr-common.c:1039:afr_launch_self_heal]
0-gfs1-replicate-5: background data gfid self-heal triggered. path:
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:38.193486] I
[afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5:
no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata
check
[2013-01-07 09:57:38.193708] I
[afr-self-heal-common.c:1050:afr_sh_missing_entries_done]
0-gfs1-replicate-5: split brain found, aborting selfheal of
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:38.193731] E
[afr-self-heal-common.c:2190:afr_self_heal_completion_cbk]
0-gfs1-replicate-5: background data gfid self-heal failed on
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 09:57:38.193937] W [afr-open.c:168:afr_open] 0-gfs1-replicate-5:
failed to open as split brain seen, returning EIO
[2013-01-07 09:57:38.194033] W [fuse-bridge.c:693:fuse_fd_cbk]
0-glusterfs-fuse: 3162527: OPEN() /XMTEXT/gfs1_000/000/000/095 => -1
(Input/output error)
[2013-01-07 10:08:12.569821] W
[afr-common.c:931:afr_detect_self_heal_by_lookup_status] 0-gfs1-replicate-5:
split brain detected during lookup of /XMTEXT/gfs1_000/000/000/095.
[2013-01-07 10:08:12.569891] I [afr-common.c:1039:afr_launch_self_heal]
0-gfs1-replicate-5: background data gfid self-heal triggered. path:
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 10:08:12.571538] I
[afr-self-heal-common.c:1290:sh_missing_entries_create] 0-gfs1-replicate-5:
no missing files - /XMTEXT/gfs1_000/000/000/095. proceeding to metadata
check
[2013-01-07 10:08:12.572684] I
[afr-self-heal-common.c:1050:afr_sh_missing_entries_done]
0-gfs1-replicate-5: split brain found, aborting selfheal of
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 10:08:12.572732] E
[afr-self-heal-common.c:2190:afr_self_heal_completion_cbk]
0-gfs1-replicate-5: background data gfid self-heal failed on
/XMTEXT/gfs1_000/000/000/095
[2013-01-07 10:08:12.580006] W [afr-open.c:168:afr_open] 0-gfs1-replicate-5:
failed to open as split brain seen, returning EIO
[2013-01-07 10:08:12.580103] W [fuse-bridge.c:693:fuse_fd_cbk]
0-glusterfs-fuse: 3164490: OPEN() /XMTEXT/gfs1_000/000/000/095 => -1
(Input/output error)
Thanks!
_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel
Song,
It seems like the file is in gfid-split-brain. To confirm, could you
provide the output of following command from backends.
getfattr -d -m . -e hex <file-in-split-brain>
Pranith.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130110/7d02d987/attachment-0001.html>
More information about the Gluster-devel
mailing list