[Gluster-users] Migrating a VM makes its gluster storage inaccessible

Paul Boven boven at jive.nl
Tue Jan 21 14:15:48 UTC 2014


Hi everyone

We've been running glusterfs-3.4.0 on Ubuntu 13.04, using semiosis' 
packages. We're using kvm (libvrt) to host guest installs, and thanks to 
gluster and libvirt, we can live-migrate guests between the two hosts.

Recently I ran an apt-get update/upgrade to stay up-to-date with 
security patches, and this also upgraded our glusterfs to the 3.4.1 
version of the packages.

Since this upgrade (which updated the gluster packages, but also the 
Ubuntu kernel package), kvm live migration fails in a most unusual 
manner. The live migration itself succeeds, but on the receiving 
machine, the vm-storage for that machine becomes inaccessible. Which in 
turn causes the guest OS to no longer be able to read or write its 
filesystem, with of course fairly disastrous consequences for such a guest.

So before a migration, everything is running smoothly. The two cluster 
nodes are 'cl0' and 'cl1', and we do the migration like this:

virsh migrate --live --persistent --undefinesource <guest> 
qemu+tls://cl1/system

The migration itself works, but soon as you do the migration, the 
/gluster/guest.raw file (which holds the filesystem for the guest) 
becomes completely inaccessible: trying to read it (e.g. with dd or 
md5sum) results in a 'permission denied' on the destination cluster 
node, whereas the file is still perfectly fine on the machine that the 
migration originated from.

As soon as I stop the guest (virsh destroy), the /gluster/guest.raw file 
becomes readable again and I can start up the guest on either server 
without further issues. It does not affect any of the other files in 
/gluster/.

The problem seems to be in the gluster or fuse part, because once this 
error condition is triggered, the /gluster/guest.raw cannot be read by 
any application on the destination server. This situation is 100% 
reproducible, every attempted live migration fails in this way.

Has anyone else experienced this? Is this a known or new bug?

We've done some troubleshooting already in the irc channel (thanks to 
everyone for their help) but haven't found the smoking gun yet. I would 
appreciate any help in debugging and resolving this.

Regards, Paul Boven.
-- 
Paul Boven <boven at jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science



More information about the Gluster-users mailing list