<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 26, 2018 at 6:14 PM, Dave Sherohman <span dir="ltr">&lt;<a href="mailto:dave@sherohman.org" target="_blank">dave@sherohman.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Mon, Feb 26, 2018 at 05:45:27PM +0530, Karthik Subrahmanya wrote:<br>

&gt; &gt; &quot;In a replica 2 volume... If we set the client-quorum option to<br>

&gt; &gt; auto, then the first brick must always be up, irrespective of the<br>

&gt; &gt; status of the second brick. If only the second brick is up, the<br>

&gt; &gt; subvolume becomes read-only.&quot;<br>

&gt; &gt;<br>

&gt; By default client-quorum is &quot;none&quot; in replica 2 volume.<br>

<br>

</span>I&#39;m not sure where I saw the directions saying to set it, but I do have<br>

&quot;cluster.quorum-type: auto&quot; in my volume configuration.  (And I think<br>

that&#39;s client quorum, but feel free to correct me if I&#39;ve misunderstood<br>

the docs.)<br></blockquote><div>If it is &quot;auto&quot; then I think it is reconfigured. In replica 2 it will be &quot;none&quot;. <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<span class="gmail-"><br>

&gt; It applies to all the replica 2 volumes even if it has just 2 brick or more.<br>

&gt; Total brick count in the volume doesn&#39;t matter for the quorum, what matters<br>

&gt; is the number of bricks which are up in the particular replica subvol.<br>

<br>

</span>Thanks for confirming that.<br>

<span class="gmail-"><br>

&gt; If I understood your configuration correctly it should look something like<br>

&gt; this:<br>

&gt; (Please correct me if I am wrong)<br>

&gt; replica-1:  bricks 1 &amp; 2<br>

&gt; replica-2: bricks 3 &amp; 4<br>

&gt; replica-3: bricks 5 &amp; 6<br>

<br>

</span>Yes, that&#39;s correct.<br>

<span class="gmail-"><br>

&gt; Since quorum is per replica, if it is set to auto then it needs the first<br>

&gt; brick of the particular replica subvol to be up to perform the fop.<br>

&gt;<br>

&gt; In replica 2 volumes you can end up in split-brains.<br>

<br>

</span>How would that happen if bricks which are not in (cluster-wide) quorum<br>

refuse to accept writes?  I&#39;m not seeing the reason for using individual<br>

subvolume quorums instead of full-volume quorum.<br></blockquote><div>Split brains happen within the replica pair.<br>I will try to explain how you can end up in split-brain even with cluster wide quorum:<br></div><div>Lets say you have 6 bricks (replica 2) volume and you always have at least quorum number of bricks up &amp; running.<br></div><div>Bricks 1 &amp; 2 are part of replica subvol-1<br>Bricks 3 &amp; 4 are part of replica subvol-2<br>Bricks 5 &amp; 6 are part of replica subvol-3<br><br></div><div>- Brick 1 goes down and a write comes on a file which is part of that replica subvol-1<br></div><div>- Quorum is met since we have 5 out of 6 bricks are running<br></div><div>- Brick 2 says brick 1 is bad<br></div><div>- Brick 2 goes down and brick 1 comes up. Heal did not happened<br></div><div>- Write comes on the same file, quorum is met, and now brick 1 says brick 2 is bad<br></div><div>- When both the bricks 1 &amp; 2 are up, both of them blame the other brick - *split-brain*<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<span class="gmail-"><br>

&gt; It would be great if you can consider configuring an arbiter or<br>

&gt; replica 3 volume.<br>

<br>

</span>I can.  My bricks are 2x850G and 4x11T, so I can repurpose the small<br>

bricks as arbiters with minimal effect on capacity.  What would be the<br>

sequence of commands needed to:<br>

<br>

1) Move all data off of bricks 1 &amp; 2<br>

2) Remove that replica from the cluster<br>

3) Re-add those two bricks as arbiters<br> 

<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

(And did I miss any additional steps?)<br>

<br>

Unfortunately, I&#39;ve been running a few months already with the current<br>

configuration and there are several virtual machines running off the<br>

existing volume, so I&#39;ll need to reconfigure it online if possible.<br></blockquote><div>Without knowing the volume configuration it is difficult to suggest the configuration change,<br></div><div>and since it is a live system you may end up in data unavailability or data loss.<br></div><div>Can you give the output of &quot;gluster volume info &lt;volname&gt;&quot;<br></div><div>and which brick is of what size.<br></div><div>Note: The arbiter bricks need not be of bigger size.<br>[1] gives information about how you can provision the arbiter brick.<br></div><div><br>[1] <a href="http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing">http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing</a><br><br></div><div>Regards,<br></div><div>Karthik<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<span class="gmail-HOEnZb"><font color="#888888"><br>

--<br>

Dave Sherohman<br>

</font></span></blockquote></div><br></div></div>