[Gluster-users] Quorum in distributed-replicate volume
ksubrahm at redhat.com
Tue Feb 27 06:30:29 UTC 2018
On Mon, Feb 26, 2018 at 6:14 PM, Dave Sherohman <dave at sherohman.org> wrote:
> On Mon, Feb 26, 2018 at 05:45:27PM +0530, Karthik Subrahmanya wrote:
> > > "In a replica 2 volume... If we set the client-quorum option to
> > > auto, then the first brick must always be up, irrespective of the
> > > status of the second brick. If only the second brick is up, the
> > > subvolume becomes read-only."
> > >
> > By default client-quorum is "none" in replica 2 volume.
> I'm not sure where I saw the directions saying to set it, but I do have
> "cluster.quorum-type: auto" in my volume configuration. (And I think
> that's client quorum, but feel free to correct me if I've misunderstood
> the docs.)
If it is "auto" then I think it is reconfigured. In replica 2 it will be
> > It applies to all the replica 2 volumes even if it has just 2 brick or
> > Total brick count in the volume doesn't matter for the quorum, what
> > is the number of bricks which are up in the particular replica subvol.
> Thanks for confirming that.
> > If I understood your configuration correctly it should look something
> > this:
> > (Please correct me if I am wrong)
> > replica-1: bricks 1 & 2
> > replica-2: bricks 3 & 4
> > replica-3: bricks 5 & 6
> Yes, that's correct.
> > Since quorum is per replica, if it is set to auto then it needs the first
> > brick of the particular replica subvol to be up to perform the fop.
> > In replica 2 volumes you can end up in split-brains.
> How would that happen if bricks which are not in (cluster-wide) quorum
> refuse to accept writes? I'm not seeing the reason for using individual
> subvolume quorums instead of full-volume quorum.
Split brains happen within the replica pair.
I will try to explain how you can end up in split-brain even with cluster
Lets say you have 6 bricks (replica 2) volume and you always have at least
quorum number of bricks up & running.
Bricks 1 & 2 are part of replica subvol-1
Bricks 3 & 4 are part of replica subvol-2
Bricks 5 & 6 are part of replica subvol-3
- Brick 1 goes down and a write comes on a file which is part of that
- Quorum is met since we have 5 out of 6 bricks are running
- Brick 2 says brick 1 is bad
- Brick 2 goes down and brick 1 comes up. Heal did not happened
- Write comes on the same file, quorum is met, and now brick 1 says brick 2
- When both the bricks 1 & 2 are up, both of them blame the other brick -
> > It would be great if you can consider configuring an arbiter or
> > replica 3 volume.
> I can. My bricks are 2x850G and 4x11T, so I can repurpose the small
> bricks as arbiters with minimal effect on capacity. What would be the
> sequence of commands needed to:
> 1) Move all data off of bricks 1 & 2
> 2) Remove that replica from the cluster
> 3) Re-add those two bricks as arbiters
(And did I miss any additional steps?)
> Unfortunately, I've been running a few months already with the current
> configuration and there are several virtual machines running off the
> existing volume, so I'll need to reconfigure it online if possible.
Without knowing the volume configuration it is difficult to suggest the
and since it is a live system you may end up in data unavailability or data
Can you give the output of "gluster volume info <volname>"
and which brick is of what size.
Note: The arbiter bricks need not be of bigger size.
 gives information about how you can provision the arbiter brick.
> Dave Sherohman
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users