[Gluster-users] HA storage based on two nodes with one point of failure

Юрий Полторацкий y.poltoratskiy at gmail.com
Mon Jun 8 06:12:02 UTC 2015


2015-06-08 8:32 GMT+03:00 Ravishankar N <ravishankar at redhat.com>:

>
>
> On 06/08/2015 02:38 AM, Юрий Полторацкий wrote:
>
> Hi,
>
> I have made a lab with a config listed below and have got unexpected
> result. Someone, tell me, please, where did I go wrong?
>
> I am testing oVirt. Data Center has two clusters: the first as a computing
> with three nodes (node1, node2, node3); the second as a storage (node5,
> node6) based on glusterfs (replica 2).
>
> I want the storage to be HA. I have read here
> <https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html>
> next:
> For a replicated volume with two nodes and one brick on each machine, if
> the server-side quorum is enabled and one of the nodes goes offline, the
> other node will also be taken offline because of the quorum configuration.
> As a result, the high availability provided by the replication is
> ineffective. To prevent this situation, a dummy node can be added to the
> trusted storage pool which does not contain any bricks. This ensures that
> even if one of the nodes which contains data goes offline, the other node
> will remain online. Note that if the dummy node and one of the data nodes
> goes offline, the brick on other node will be also be taken offline, and
> will result in data unavailability.
>
> So, I have added my "Engine" (not self-hosted) as a dummy node without a
> brick and have configured quorum as listed below:
> cluster.quorum-type: fixed
> cluster.quorum-count: 1
> cluster.server-quorum-type: server
> cluster.server-quorum-ratio: 51%
>
>
> Then, I've run a VM and have dropped the network link from node6, after
> one a hour have switched back the link and after a while have got a
> split-brain. But why? No one could write to the brick on node6: the VM was
> running on node3 and node1 was SPM.
>
>
>
> It could have happened that after node6 came up, the client(s) saw a
> temporary disconnect of node 5 and a write happened at that time. When the
> node 5 is connected again, we have AFR xattrs on both nodes blaming each
> other, causing split-brain. For a replica 2 setup. it is best to set the
> client-quorum to auto instead of fixed. What this means is that the first
> node of the replica must always be up for writes to be permitted. If the
> first node goes down, the volume becomes read-only.
>
Yes, at first I have tested with client-quorum auto, but my VMs has been
paused when the first node goes down and this is not unacceptable....

Ok, I understood: there is now way to have fault tolerance storage with
only two servers using GlusterFS. I have to get another one.

Thanks.


> For better availability , it would be better to use a replica 3 volume
> with (again with client-quorum set to auto). If you are using glusterfs
> 3.7, you can also consider using the arbiter configuration [1] for replica
> 3.
>
> [1]
> https://github.com/gluster/glusterfs/blob/master/doc/features/afr-arbiter-volumes.md
>
> Thanks,
> Ravi
>
>
>  Gluster's log from node6:
> Июн 07 15:35:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]:
> [2015-06-07 12:35:06.106270] C [MSGID: 106002]
> [glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume vol3. Stopping local bricks.
> Июн 07 16:30:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]:
> [2015-06-07 13:30:06.261505] C [MSGID: 106003]
> [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume vol3. Starting local bricks.
>
>
> gluster> volume heal vol3 info
> Brick node5.virt.local:/storage/brick12/
> /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain
>
> Number of entries: 1
>
> Brick node6.virt.local:/storage/brick13/
> /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain
>
> Number of entries: 1
>
>
> gluster> volume info vol3
>
> Volume Name: vol3
> Type: Replicate
> Volume ID: 69ba8c68-6593-41ca-b1d9-40b3be50ac80
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: node5.virt.local:/storage/brick12
> Brick2: node6.virt.local:/storage/brick13
> Options Reconfigured:
> storage.owner-gid: 36
> storage.owner-uid: 36
> cluster.server-quorum-type: server
> cluster.quorum-type: fixed
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> auth.allow: *
> user.cifs: disable
> nfs.disable: on
> performance.readdir-ahead: on
> cluster.quorum-count: 1
> cluster.server-quorum-ratio: 51%
>
>
>
> 06.06.2015 12:09, Юрий Полторацкий пишет:
>
>     Hi,
>
>  I want to build a HA storage based on two servers. I want that if one
> goes down, my storage will be available in RW mode.
>
>  If I will use replica 2, then split-brain can occur. To avoid this I
> would use a quorum. As I understand correctly, I can use quorum on a client
> side, on a server side, or on both. I want to add a dummy node without a
> brick and make such config:
>
> cluster.quorum-type: fixed
> cluster.quorum-count: 1
> cluster.server-quorum-type: server
> cluster.server-quorum-ratio: 51%
>
>  I expect that client will have access in RW mode until one brick alive.
> On the other side if server's quorum will not meet, then bricks will be RO.
>
> Say, HOST1 with a brick BRICK1, HOST2 with a brick BRICK2, and HOST3
> without a brick.
>
> Once HOST1 lose a network connection, than on this node server quorum will
> not meet and the brick BRICK1 will not be able for writing. But on HOST2
> there is no problem with server quorum (HOST2 + HOST3 > 51%) and that's why
> BRICK2 still accessible for writing. With client's quorum there is no
> problem also - one brick is alive, so client can write on it.
>
>  I have made a lab using KVM on my desktop and it seems to be worked well
> and as expected.
>
>  The main question is:
>  Can I use such a storage for production?
>
>  Thanks.
>
>
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/5f9a286c/attachment.html>


More information about the Gluster-users mailing list