[Gluster-users] Virt-store use case - HA failure issue - suggestions needed

Thu Jul 31 16:22:16 UTC 2014

I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM environment.
Centos 6.5:
Two servers (KVM07 & KVM08), Two brick (one brick per server) replicated
volume

I've tuned the volume per the documentation here:
http://gluster.org/documentation/use_cases/Virt-store-usecase/

I have the gluster volume fuse mounted on KVM07 and KVM08 and am using it
to store raw disk images.

KVM is using the fuse mounted volume as a "dir: Filesystem Directory:
storage pool.

With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and chown-ing
the files to qemu:qemu, live migration works great.

Problem:
If I need to take down one of these servers for maintenance, I live migrate
the VMs to the other server.
service gluster stop
then kill all the remaining gluster and brick processes.

At this point, the VMs die.  The Fuse mount recovers and remains attached
to the volume via the other server, but the VIRT disk images are not fully
synced.

This causes the VMs to go into a read-only files system state, then kernel
panic.  Reboots/restarts of the VMs just cause kernel panics.  This
effectively brings down the two node cluster.

Bringing back up the gluster node / bricks /etc, prompts a self-heal.  Once
self-heal is completed, the VMs can boot normally.

Question: is there a better way to accomplish HA with live/running Virt
images?  The goal is to be able to bring down any one server in the pair
and perform maintenance without interrupting the VMs.

I assume my shutdown process is flawed but haven't been able to find a
better process.

Any suggestions are welcome.

-- 
-Vince Loschiavo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140731/471d1439/attachment.html>