[Gluster-users] Split-brains when shutting down Xen domUs

Daniel Manser me at danielmanser.com
Wed Nov 30 10:11:36 UTC 2011


We are running a couple Xen domUs on a two-node Gluster setup (2 
Gluster nodes, 2 Xen dedicated hosts, all machines run CentOS). Each 
domU image is located in its own volume.

At the precise moment when I shut down the domU (from inside the domU), 
I got the following log entry on the Xen host (Gluster client):

[2011-11-29 15:28:24.579646] I 
[afr-self-heal-common.c:537:afr_sh_mark_sources] 
0-vol0_atmail1_example_org-replicate-0: split-brain possible, no source 
detected
[2011-11-29 15:28:24.579707] I [afr-common.c:801:afr_lookup_done] 
0-vol0_atmail1_example_org-replicate-0: background  data self-heal 
triggered. path: /atmail1.example.org.img
[2011-11-29 15:28:24.581251] I 
[afr-self-heal-common.c:537:afr_sh_mark_sources] 
0-vol0_atmail1_example_org-replicate-0: split-brain possible, no source 
detected
[2011-11-29 15:28:24.581282] E 
[afr-self-heal-data.c:637:afr_sh_data_fix] 
0-vol0_atmail1_example_org-replicate-0: Unable to self-heal contents of 
'/atmail1.example.org.img' (possible split-brain).
Please delete the file from all but the preferred subvolume.
[2011-11-29 15:28:24.582075] I 
[afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 
0-vol0_atmail1_example_org-replicate-0: background  data data self-heal 
completed on /atmail1.example.org.img
[2011-11-29 15:28:24.778445] W [afr-open.c:168:afr_open] 
0-vol0_atmail1_example_org-replicate-0: failed to open as split brain 
seen, returning EIO
[2011-11-29 15:28:24.778503] W [fuse-bridge.c:582:fuse_fd_cbk] 
0-glusterfs-fuse: 18943778: OPEN() /atmail1.example.org.img => -1 
(Input/output error)
[2011-11-29 15:28:24.778585] W [afr-open.c:168:afr_open] 
0-vol0_atmail1_example_org-replicate-0: failed to open as split brain 
seen, returning EIO
[2011-11-29 15:28:24.778610] W [fuse-bridge.c:582:fuse_fd_cbk] 
0-glusterfs-fuse: 18943779: OPEN() /atmail1.example.org.img => -1 
(Input/output error)
[2011-11-29 15:28:25.93271] W [afr-open.c:168:afr_open] 
0-vol0_atmail1_example_org-replicate-0: failed to open as split brain 
seen, returning EIO
[2011-11-29 15:28:25.93327] W [fuse-bridge.c:582:fuse_fd_cbk] 
0-glusterfs-fuse: 18943780: OPEN() /atmail1.example.org.img => -1 
(Input/output error)

I've had to delete one image on a Gluster node, trigger 
self-heal/replication, and then start the domU again. The split-brain 
situations do not seem to happen every time, though.

On the second Xen host, the Gluster volume is mounted but nothing 
writes/reads from that vol. I don't think this should be a problem since 
Gluster can handle multiple clients.

My volume setup is pretty straightforward:

[root at glu1 ~]# gluster volume info vol0_atmail1_example_org

Volume Name: vol0_atmail1_example_org
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: glu1.example.org:/mnt/vol0/atmail1_example_org
Brick2: glu2.example.org:/mnt/vol0/atmail1_example_org
Options Reconfigured:
network.ping-timeout: 10

I wonder if someone ran into similar problems with Xen, and what 
solution they might came up with.

Daniel



More information about the Gluster-users mailing list