[Gluster-users] Virt-store use case - HA failure issue - suggestions needed

Thu Jul 31 20:37:01 UTC 2014

On Thu, Jul 31, 2014 at 12:02 PM, Humble Devassy Chirammal <
humble.devassy at gmail.com> wrote:

> I  second Jason, either the quorum=auto has to be disabled or just add one
> more server to the trusted pool and find the result .
>
> --Humble
>
>
> On Fri, Aug 1, 2014 at 12:22 AM, Jason Brooks <jbrooks at redhat.com> wrote:
>
>>
>>
>> ----- Original Message -----
>> > From: "Vince Loschiavo" <vloschiavo at gmail.com>
>> > To: gluster-users at gluster.org
>> > Sent: Thursday, July 31, 2014 9:22:16 AM
>> > Subject: [Gluster-users] Virt-store use case - HA failure issue -
>> suggestions needed
>> >
>> > I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM
>> environment.
>> > Centos 6.5:
>> > Two servers (KVM07 & KVM08), Two brick (one brick per server) replicated
>> > volume
>> >
>> > I've tuned the volume per the documentation here:
>> > http://gluster.org/documentation/use_cases/Virt-store-usecase/
>> >
>> > I have the gluster volume fuse mounted on KVM07 and KVM08 and am using
>> it
>> > to store raw disk images.
>> >
>> > KVM is using the fuse mounted volume as a "dir: Filesystem Directory:
>> > storage pool.
>> >
>> > With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and
>> chown-ing
>> > the files to qemu:qemu, live migration works great.
>> >
>> > Problem:
>> > If I need to take down one of these servers for maintenance, I live
>> migrate
>> > the VMs to the other server.
>> > service gluster stop
>> > then kill all the remaining gluster and brick processes.
>>
>> The guide says that quorum-type=auto sets a rule such that at least half
>> of the bricks in the replica group should be UP and running. If not,
>> the replica group becomes read-only. I think the rule is actually 51%,
>> so bringing down one of the two servers makes your volume read-only.
>>
>> If you want two servers, you need to unset this rule. Better to add a
>> third server and a third replica, though.
>>
>> Regards, Jason
>>
>>
>> >
>> > At this point, the VMs die.  The Fuse mount recovers and remains
>> attached
>> > to the volume via the other server, but the VIRT disk images are not
>> fully
>> > synced.
>> >
>> > This causes the VMs to go into a read-only files system state, then
>> kernel
>> > panic.  Reboots/restarts of the VMs just cause kernel panics.  This
>> > effectively brings down the two node cluster.
>> >
>> > Bringing back up the gluster node / bricks /etc, prompts a self-heal.
>>  Once
>> > self-heal is completed, the VMs can boot normally.
>> >
>> > Question: is there a better way to accomplish HA with live/running Virt
>> > images?  The goal is to be able to bring down any one server in the pair
>> > and perform maintenance without interrupting the VMs.
>> >
>> > I assume my shutdown process is flawed but haven't been able to find a
>> > better process.
>> >
>> > Any suggestions are welcome.
>> >
>> >
>> > --
>> > -Vince Loschiavo
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>

That was it.  Thank you.  I'm somewhat space constrained in my lab, so I
chose to disable quorum and set server-quorum to 50%.  I assume that was
redundant, but it works for me.

-- 
-Vince Loschiavo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140731/45fd0cf2/attachment.html>