<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 26, 2018 at 6:14 PM, Dave Sherohman <span dir="ltr"><<a href="mailto:dave@sherohman.org" target="_blank">dave@sherohman.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Mon, Feb 26, 2018 at 05:45:27PM +0530, Karthik Subrahmanya wrote:<br>
> > "In a replica 2 volume... If we set the client-quorum option to<br>
> > auto, then the first brick must always be up, irrespective of the<br>
> > status of the second brick. If only the second brick is up, the<br>
> > subvolume becomes read-only."<br>
> ><br>
> By default client-quorum is "none" in replica 2 volume.<br>
<br>
</span>I'm not sure where I saw the directions saying to set it, but I do have<br>
"cluster.quorum-type: auto" in my volume configuration. (And I think<br>
that's client quorum, but feel free to correct me if I've misunderstood<br>
the docs.)<br></blockquote><div>If it is "auto" then I think it is reconfigured. In replica 2 it will be "none". <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-"><br>
> It applies to all the replica 2 volumes even if it has just 2 brick or more.<br>
> Total brick count in the volume doesn't matter for the quorum, what matters<br>
> is the number of bricks which are up in the particular replica subvol.<br>
<br>
</span>Thanks for confirming that.<br>
<span class="gmail-"><br>
> If I understood your configuration correctly it should look something like<br>
> this:<br>
> (Please correct me if I am wrong)<br>
> replica-1: bricks 1 & 2<br>
> replica-2: bricks 3 & 4<br>
> replica-3: bricks 5 & 6<br>
<br>
</span>Yes, that's correct.<br>
<span class="gmail-"><br>
> Since quorum is per replica, if it is set to auto then it needs the first<br>
> brick of the particular replica subvol to be up to perform the fop.<br>
><br>
> In replica 2 volumes you can end up in split-brains.<br>
<br>
</span>How would that happen if bricks which are not in (cluster-wide) quorum<br>
refuse to accept writes? I'm not seeing the reason for using individual<br>
subvolume quorums instead of full-volume quorum.<br></blockquote><div>Split brains happen within the replica pair.<br>I will try to explain how you can end up in split-brain even with cluster wide quorum:<br></div><div>Lets say you have 6 bricks (replica 2) volume and you always have at least quorum number of bricks up & running.<br></div><div>Bricks 1 & 2 are part of replica subvol-1<br>Bricks 3 & 4 are part of replica subvol-2<br>Bricks 5 & 6 are part of replica subvol-3<br><br></div><div>- Brick 1 goes down and a write comes on a file which is part of that replica subvol-1<br></div><div>- Quorum is met since we have 5 out of 6 bricks are running<br></div><div>- Brick 2 says brick 1 is bad<br></div><div>- Brick 2 goes down and brick 1 comes up. Heal did not happened<br></div><div>- Write comes on the same file, quorum is met, and now brick 1 says brick 2 is bad<br></div><div>- When both the bricks 1 & 2 are up, both of them blame the other brick - *split-brain*<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-"><br>
> It would be great if you can consider configuring an arbiter or<br>
> replica 3 volume.<br>
<br>
</span>I can. My bricks are 2x850G and 4x11T, so I can repurpose the small<br>
bricks as arbiters with minimal effect on capacity. What would be the<br>
sequence of commands needed to:<br>
<br>
1) Move all data off of bricks 1 & 2<br>
2) Remove that replica from the cluster<br>
3) Re-add those two bricks as arbiters<br>
<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
(And did I miss any additional steps?)<br>
<br>
Unfortunately, I've been running a few months already with the current<br>
configuration and there are several virtual machines running off the<br>
existing volume, so I'll need to reconfigure it online if possible.<br></blockquote><div>Without knowing the volume configuration it is difficult to suggest the configuration change,<br></div><div>and since it is a live system you may end up in data unavailability or data loss.<br></div><div>Can you give the output of "gluster volume info <volname>"<br></div><div>and which brick is of what size.<br></div><div>Note: The arbiter bricks need not be of bigger size.<br>[1] gives information about how you can provision the arbiter brick.<br></div><div><br>[1] <a href="http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing">http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing</a><br><br></div><div>Regards,<br></div><div>Karthik<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-HOEnZb"><font color="#888888"><br>
--<br>
Dave Sherohman<br>
</font></span></blockquote></div><br></div></div>