[Gluster-users] Split-brains when shutting down Xen domUs
Daniel Manser
me at danielmanser.com
Wed Nov 30 10:11:36 UTC 2011
We are running a couple Xen domUs on a two-node Gluster setup (2
Gluster nodes, 2 Xen dedicated hosts, all machines run CentOS). Each
domU image is located in its own volume.
At the precise moment when I shut down the domU (from inside the domU),
I got the following log entry on the Xen host (Gluster client):
[2011-11-29 15:28:24.579646] I
[afr-self-heal-common.c:537:afr_sh_mark_sources]
0-vol0_atmail1_example_org-replicate-0: split-brain possible, no source
detected
[2011-11-29 15:28:24.579707] I [afr-common.c:801:afr_lookup_done]
0-vol0_atmail1_example_org-replicate-0: background data self-heal
triggered. path: /atmail1.example.org.img
[2011-11-29 15:28:24.581251] I
[afr-self-heal-common.c:537:afr_sh_mark_sources]
0-vol0_atmail1_example_org-replicate-0: split-brain possible, no source
detected
[2011-11-29 15:28:24.581282] E
[afr-self-heal-data.c:637:afr_sh_data_fix]
0-vol0_atmail1_example_org-replicate-0: Unable to self-heal contents of
'/atmail1.example.org.img' (possible split-brain).
Please delete the file from all but the preferred subvolume.
[2011-11-29 15:28:24.582075] I
[afr-self-heal-common.c:1557:afr_self_heal_completion_cbk]
0-vol0_atmail1_example_org-replicate-0: background data data self-heal
completed on /atmail1.example.org.img
[2011-11-29 15:28:24.778445] W [afr-open.c:168:afr_open]
0-vol0_atmail1_example_org-replicate-0: failed to open as split brain
seen, returning EIO
[2011-11-29 15:28:24.778503] W [fuse-bridge.c:582:fuse_fd_cbk]
0-glusterfs-fuse: 18943778: OPEN() /atmail1.example.org.img => -1
(Input/output error)
[2011-11-29 15:28:24.778585] W [afr-open.c:168:afr_open]
0-vol0_atmail1_example_org-replicate-0: failed to open as split brain
seen, returning EIO
[2011-11-29 15:28:24.778610] W [fuse-bridge.c:582:fuse_fd_cbk]
0-glusterfs-fuse: 18943779: OPEN() /atmail1.example.org.img => -1
(Input/output error)
[2011-11-29 15:28:25.93271] W [afr-open.c:168:afr_open]
0-vol0_atmail1_example_org-replicate-0: failed to open as split brain
seen, returning EIO
[2011-11-29 15:28:25.93327] W [fuse-bridge.c:582:fuse_fd_cbk]
0-glusterfs-fuse: 18943780: OPEN() /atmail1.example.org.img => -1
(Input/output error)
I've had to delete one image on a Gluster node, trigger
self-heal/replication, and then start the domU again. The split-brain
situations do not seem to happen every time, though.
On the second Xen host, the Gluster volume is mounted but nothing
writes/reads from that vol. I don't think this should be a problem since
Gluster can handle multiple clients.
My volume setup is pretty straightforward:
[root at glu1 ~]# gluster volume info vol0_atmail1_example_org
Volume Name: vol0_atmail1_example_org
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: glu1.example.org:/mnt/vol0/atmail1_example_org
Brick2: glu2.example.org:/mnt/vol0/atmail1_example_org
Options Reconfigured:
network.ping-timeout: 10
I wonder if someone ran into similar problems with Xen, and what
solution they might came up with.
Daniel
More information about the Gluster-users
mailing list