[Gluster-users] Quorum in distributed-replicate volume
Karthik Subrahmanya
ksubrahm at redhat.com
Tue Feb 27 06:30:29 UTC 2018
On Mon, Feb 26, 2018 at 6:14 PM, Dave Sherohman <dave at sherohman.org> wrote:
> On Mon, Feb 26, 2018 at 05:45:27PM +0530, Karthik Subrahmanya wrote:
> > > "In a replica 2 volume... If we set the client-quorum option to
> > > auto, then the first brick must always be up, irrespective of the
> > > status of the second brick. If only the second brick is up, the
> > > subvolume becomes read-only."
> > >
> > By default client-quorum is "none" in replica 2 volume.
>
> I'm not sure where I saw the directions saying to set it, but I do have
> "cluster.quorum-type: auto" in my volume configuration. (And I think
> that's client quorum, but feel free to correct me if I've misunderstood
> the docs.)
>
If it is "auto" then I think it is reconfigured. In replica 2 it will be
"none".
>
> > It applies to all the replica 2 volumes even if it has just 2 brick or
> more.
> > Total brick count in the volume doesn't matter for the quorum, what
> matters
> > is the number of bricks which are up in the particular replica subvol.
>
> Thanks for confirming that.
>
> > If I understood your configuration correctly it should look something
> like
> > this:
> > (Please correct me if I am wrong)
> > replica-1: bricks 1 & 2
> > replica-2: bricks 3 & 4
> > replica-3: bricks 5 & 6
>
> Yes, that's correct.
>
> > Since quorum is per replica, if it is set to auto then it needs the first
> > brick of the particular replica subvol to be up to perform the fop.
> >
> > In replica 2 volumes you can end up in split-brains.
>
> How would that happen if bricks which are not in (cluster-wide) quorum
> refuse to accept writes? I'm not seeing the reason for using individual
> subvolume quorums instead of full-volume quorum.
>
Split brains happen within the replica pair.
I will try to explain how you can end up in split-brain even with cluster
wide quorum:
Lets say you have 6 bricks (replica 2) volume and you always have at least
quorum number of bricks up & running.
Bricks 1 & 2 are part of replica subvol-1
Bricks 3 & 4 are part of replica subvol-2
Bricks 5 & 6 are part of replica subvol-3
- Brick 1 goes down and a write comes on a file which is part of that
replica subvol-1
- Quorum is met since we have 5 out of 6 bricks are running
- Brick 2 says brick 1 is bad
- Brick 2 goes down and brick 1 comes up. Heal did not happened
- Write comes on the same file, quorum is met, and now brick 1 says brick 2
is bad
- When both the bricks 1 & 2 are up, both of them blame the other brick -
*split-brain*
>
> > It would be great if you can consider configuring an arbiter or
> > replica 3 volume.
>
> I can. My bricks are 2x850G and 4x11T, so I can repurpose the small
> bricks as arbiters with minimal effect on capacity. What would be the
> sequence of commands needed to:
>
> 1) Move all data off of bricks 1 & 2
> 2) Remove that replica from the cluster
> 3) Re-add those two bricks as arbiters
>
>
(And did I miss any additional steps?)
>
> Unfortunately, I've been running a few months already with the current
> configuration and there are several virtual machines running off the
> existing volume, so I'll need to reconfigure it online if possible.
>
Without knowing the volume configuration it is difficult to suggest the
configuration change,
and since it is a live system you may end up in data unavailability or data
loss.
Can you give the output of "gluster volume info <volname>"
and which brick is of what size.
Note: The arbiter bricks need not be of bigger size.
[1] gives information about how you can provision the arbiter brick.
[1]
http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing
Regards,
Karthik
>
> --
> Dave Sherohman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180227/7605cde5/attachment.html>
More information about the Gluster-users
mailing list