[Gluster-users] Virt-store use case - HA failure issue - suggestions needed

Thu Jul 31 19:02:54 UTC 2014

I  second Jason, either the quorum=auto has to be disabled or just add one
more server to the trusted pool and find the result .

--Humble

On Fri, Aug 1, 2014 at 12:22 AM, Jason Brooks <jbrooks at redhat.com> wrote:

>
>
> ----- Original Message -----
> > From: "Vince Loschiavo" <vloschiavo at gmail.com>
> > To: gluster-users at gluster.org
> > Sent: Thursday, July 31, 2014 9:22:16 AM
> > Subject: [Gluster-users] Virt-store use case - HA failure issue -
> suggestions needed
> >
> > I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM environment.
> > Centos 6.5:
> > Two servers (KVM07 & KVM08), Two brick (one brick per server) replicated
> > volume
> >
> > I've tuned the volume per the documentation here:
> > http://gluster.org/documentation/use_cases/Virt-store-usecase/
> >
> > I have the gluster volume fuse mounted on KVM07 and KVM08 and am using it
> > to store raw disk images.
> >
> > KVM is using the fuse mounted volume as a "dir: Filesystem Directory:
> > storage pool.
> >
> > With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and
> chown-ing
> > the files to qemu:qemu, live migration works great.
> >
> > Problem:
> > If I need to take down one of these servers for maintenance, I live
> migrate
> > the VMs to the other server.
> > service gluster stop
> > then kill all the remaining gluster and brick processes.
>
> The guide says that quorum-type=auto sets a rule such that at least half
> of the bricks in the replica group should be UP and running. If not,
> the replica group becomes read-only. I think the rule is actually 51%,
> so bringing down one of the two servers makes your volume read-only.
>
> If you want two servers, you need to unset this rule. Better to add a
> third server and a third replica, though.
>
> Regards, Jason
>
>
> >
> > At this point, the VMs die.  The Fuse mount recovers and remains attached
> > to the volume via the other server, but the VIRT disk images are not
> fully
> > synced.
> >
> > This causes the VMs to go into a read-only files system state, then
> kernel
> > panic.  Reboots/restarts of the VMs just cause kernel panics.  This
> > effectively brings down the two node cluster.
> >
> > Bringing back up the gluster node / bricks /etc, prompts a self-heal.
>  Once
> > self-heal is completed, the VMs can boot normally.
> >
> > Question: is there a better way to accomplish HA with live/running Virt
> > images?  The goal is to be able to bring down any one server in the pair
> > and perform maintenance without interrupting the VMs.
> >
> > I assume my shutdown process is flawed but haven't been able to find a
> > better process.
> >
> > Any suggestions are welcome.
> >
> >
> > --
> > -Vince Loschiavo
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140801/89790763/attachment.html>