[Gluster-users] Fail of one brick lead to crash VMs

Thu Feb 11 17:03:36 UTC 2016

Hi Dominique, 

I saw the logs attached. At some point all bricks seem to have gone down as I see 
[2016-01-31 16:17:20.907680] E [MSGID: 108006] [afr-common.c:3999:afr_notify] 0-cluster1-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. 
in the client logs. 

This *may* have been the reason for the VMs going offline. 

Also, Steve's inputs are correct wrt the distinction between server quorum and client quorum. Usually it is recommended that you do the following things while using Gluster for VM store use case: 

i) use replica 3 (as opposed to replica 2) volume. In your case the third node should also be used to host a brick of the volume. 
You can use arbiter feature if you want to minimise the cost of investing in three machines. 
Check this out: https://gluster.readthedocs.org/en/release-3.7.0/Features/afr-arbiter-volumes/ 

Also if you plan to use arbiter, it is recommended that you do so with glusterfs-3.7.8 as it contains some critical bug fixes. 

ii) Once you're done with 1), enable group virt option on the volume: 
# gluster volume set <VOLNAME> group virt 
which will initialise the volume configuration specifically meant to be used for VM store use case (including initialisation of the right quorum options) in one step. 

iii) have you tried sharding yet? If not, you could give that a try too. It has been found to be useful for VM store workload. 
Check this out: http://blog.gluster.org/2015/12/introducing-shard-translator/ 

Let me know if this works for you. 

-Krutika 

----- Original Message -----

> From: "Steve Dainard" <sdainard at spd1.com>
> To: "Dominique Roux" <dominique.roux at ungleich.ch>
> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> Sent: Thursday, February 11, 2016 3:52:18 AM
> Subject: Re: [Gluster-users] Fail of one brick lead to crash VMs

> For what it's worth, I've never been able to lose a brick in a 2 brick
> replica volume and still be able to write data.

> I've also found the documentation confusing as to what 'Option:
> cluster.server-quorum-type' actually means.
> Default Value: (null)
> Description: This feature is on the server-side i.e. in glusterd.
> Whenever the glusterd on a machine observes that the quorum is not
> met, it brings down the bricks to prevent data split-brains. When the
> network connections are brought back up and the quorum is restored the
> bricks in the volume are brought back up.

> It seems to be implying a brick quorum, but I think it actually means
> a glusterd quorum. In other words, if 2/3 glusterd processes fail,
> take the brick offline. This would seem to make sense in your
> configuration.

> But

> There are also two other quorum settings which seem to be more focused
> on brick count/ratio to form quorum:

> Option: cluster.quorum-type
> Default Value: none
> Description: If value is "fixed" only allow writes if quorum-count
> bricks are present. If value is "auto" only allow writes if more than
> ha
> lf of bricks, or exactly half including the first, are present.

> Option: cluster.quorum-count
> Default Value: (null)
> Description: If quorum-type is "fixed" only allow writes if this many
> bricks or present. Other quorum types will OVERWRITE this value.

> So you might be able to set type as 'fixed' and count as '1' and with
> cluster.server-quorum-type: server
> already enabled get what you want.

> But again, I've never had this work properly, and always ended up with
> split-brains which are difficult to resolve when you're storing vm
> images rather than files.

> Your other options are; use your 3rd server as another brick, and do
> replica 3 (which I've had good success with).

> Or seeing as you're using 3.7 you could look into arbiter nodes if
> they're stable in current version.

> On Mon, Feb 8, 2016 at 6:20 AM, Dominique Roux
> <dominique.roux at ungleich.ch> wrote:
> > Hi guys,
> >
> > I faced a problem a week ago.
> > In our environment we have three servers in a quorum. The gluster volume
> > is spreaded over two bricks and has the type replicated.
> >
> > We now, for simulating a fail of one brick, isolated one of the two
> > bricks with iptables, so that communication to the other two peers
> > wasn't possible anymore.
> > After that VMs (opennebula) which had I/O in this time crashed.
> > We stopped the glusterfsd hard (kill -9) and restarted it, what made
> > things work again (Certainly we also had to restart the failed VMs). But
> > I think this shouldn't happen. Since quorum was not reached (2/3 hosts
> > were still up and connected).
> >
> > Here some infos of our system:
> > OS: CentOS Linux release 7.1.1503
> > Glusterfs version: glusterfs 3.7.3
> >
> > gluster volume info:
> >
> > Volume Name: cluster1
> > Type: Replicate
> > Volume ID:
> > Status: Started
> > Number of Bricks: 1 x 2 = 2
> > Transport-type: tcp
> > Bricks:
> > Brick1: srv01:/home/gluster
> > Brick2: srv02:/home/gluster
> > Options Reconfigured:
> > cluster.self-heal-daemon: enable
> > cluster.server-quorum-type: server
> > network.remote-dio: enable
> > cluster.eager-lock: enable
> > performance.stat-prefetch: on
> > performance.io-cache: off
> > performance.read-ahead: off
> > performance.quick-read: off
> > server.allow-insecure: on
> > nfs.disable: 1
> >
> > Hope you can help us.
> >
> > Thanks a lot.
> >
> > Best regards
> > Dominique
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160211/7d204ab5/attachment.html>