<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p>Yes,</p>

    <p>this makes alot of sense.</p>

    <p>It's the behavior that I was experiencing that makes no sense.</p>

    <p>When one node was shut down, the whole VM cluster locked up.</p>

    <p>However, I managed to find that the culprit were the quorum

      settings.</p>

    <p>I put the quorum at 2 bricks for quorum now, and I am not

      experiencing the problem anymore.</p>

    <p>All my vm boot disks and data disks are now sharded.</p>

    <p>We are on 10gbit networks, when the node comes backs, we do not

      see any latency really.</p>

    <p><br>

    </p>

    <p>Carl</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 2019-08-29 3:58 p.m., Darrell Budic

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:WM!3112107ad7c62ddc655bdfed0388a6c03594cfa91c62e11538e9381642474ddc4672c6062aa1ced2e121a007bbbb1734!@filter1.lastspam.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      You may be mis-understanding the way the gluster system works in

      detail here, but you’ve got the right idea overall. Since gluster

      is maintaining 3 copies of your data, you can lose a drive or a

      whole system and things will keep going without interruption

      (well, mostly, if a host node was using the system that just died,

      it may pause briefly before re-connecting to one that is still

      running via a backup-server setting or your dns configs). While

      the system is still going with one node down, that node is falling

      behind and new disk writes, and the remaining ones are keeping

      track of what’s changing. Once you repair/recover/reboot the down

      node, it will rejoin the cluster. Now the recovered system has to

      catch up, and it does this by having the other two nodes send it

      the changes. In the meantime, gluster is serving any reads for

      that data from one of the up to date nodes, even if you ask the

      one you just restarted. In order to do this healing, it had to

      lock the files to ensure no changes are made while it copies a

      chunk of them over the recovered node. When it locks them, your

      hypervisor notices they have gone read-only, and especially if it

      has a pending write for that file, may pause the VM because this

      looks like a storage issue to it. Once the file gets unlocked, it

      can be written again, and your hypervisor notices and will

      generally reactivate your VM. You may see delays too, especially

      if you only have 1G networking between your host nodes while

      everything is getting copied around. And your files could be being

      locked, updated, unlocked, locked again a few seconds or minutes

      later, etc.

      <div class=""><br class="">

      </div>

      <div class="">That’s where sharding comes into play, once you have

        a file broken up into shards, gluster can get away with only

        locking the particular shard it needs to heal, and leaving the

        whole disk image unlocked. You may still catch a brief pause if

        you try and write the specific segment of the file gluster is

        healing at the moment, but it’s also going to be much faster

        because it’s a small chuck of the file, and copies quickly.<br

          class="">

        <div class=""><br class="">

        </div>

        <div class="">Also, check out <a

href="https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/"

            class="" moz-do-not-send="true">https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/</a>,

          you probably want to set cluster.server-quorum-ratio to 50 for

          a replica-3 setup to avoid the possibility of split-brains.

          Your cluster will go write only if it loses two nodes though,

          but you can always make a change to the server-quorum-ratio

          later if you need to keep it running temporarily.<br class="">

          <div><br class="">

          </div>

          <div>Hope that makes sense of what’s going on for you,</div>

          <div><br class="">

          </div>

          <div>  -Darrell</div>

          <div><br class="">

            <blockquote type="cite" class="">

              <div class="">On Aug 23, 2019, at 5:06 PM, Carl Sirotic

                &lt;<a href="mailto:csirotic@evoqarchitecture.com"

                  class="" moz-do-not-send="true">csirotic@evoqarchitecture.com</a>&gt;

                wrote:</div>

              <br class="Apple-interchange-newline">

              <div class="">

                <meta http-equiv="Content-Type" content="text/html;

                  charset=UTF-8" class="">

                <div text="#000000" bgcolor="#FFFFFF" class="">

                  <p class="">Okay,</p>

                  <p class="">so it means, at least I am not getting the

                    expected behavior and there is hope.</p>

                  <p class="">I put the quorum settings that I was told

                    a couple of emails ago.</p>

                  <p class="">After applying virt group, they are</p>

                  <p class="">cluster.quorum-type                    

                    auto                                    <br

                      class="">

                    cluster.quorum-count                   

                    (null)                                  <br

                      class="">

                    cluster.server-quorum-type             

                    server                                  <br

                      class="">

                    cluster.server-quorum-ratio            

                    0                                       <br

                      class="">

                    cluster.quorum-reads                   

                    no                                      <br

                      class="">

                    <br class="">

                  </p>

                  <p class="">Also,</p>

                  <p class="">I just put the ping timeout to 5 seconds

                    now.</p>

                  <p class=""><br class="">

                    Carl<br class="">

                  </p>

                  <div class="moz-cite-prefix">On 2019-08-23 5:45 p.m.,

                    Ingo Fischer wrote:<br class="">

                  </div>

                  <blockquote type="cite"

cite="mid:WM!70be0689af39e9a9c31a03ebbb8526a85e48659c6f910db0e14f7159e759eff604955b78624896a3e47e88cb0eb836d0!@filter1.lastspam.com"

                    class="">

                    <meta http-equiv="content-type" content="text/html;

                      charset=UTF-8" class="">

                    Hi Carl,

                    <div class=""><br class="">

                    </div>

                    <div class="">In my understanding and experience (I

                      have a replica 3 System running too) this should

                      not happen. Can you tell your client and server

                      quorum settings?<br class="">

                      <br class="">

                      <div dir="ltr" class="">Ingo</div>

                      <div dir="ltr" class=""><br class="">

                        Am 23.08.2019 um 15:53 schrieb Carl Sirotic &lt;<a

                          href="mailto:csirotic@evoqarchitecture.com"

                          moz-do-not-send="true" class="">csirotic@evoqarchitecture.com</a>&gt;:<br

                          class="">

                        <br class="">

                      </div>

                      <blockquote type="cite" class="">

                        <div dir="ltr" class="">

                          <meta http-equiv="Content-Type"

                            content="text/html; charset=UTF-8" class="">

                          <p class="">However,</p>

                          <p class="">I must have misunderstood the

                            whole concept of gluster.</p>

                          <p class="">In a replica 3, for me, it's

                            completely unacceptable, regardless of the

                            options, that all my VMs go down when I

                            reboot one node.</p>

                          <p class="">The whole purpose of having a full

                            3 copy of my data on the fly is suposed to

                            be this.</p>

                          <p class="">I am in the process of sharding

                            every file.</p>

                          <p class="">But even if the healing time would

                            be longer, I would still expect a

                            non-sharded replica 3 brick with vm boot

                            disk, to not go down if I reboot one of its

                            copy.</p>

                          <p class=""><br class="">

                          </p>

                          <p class="">I am not very impressed by gluster

                            so far.<br class="">

                          </p>

                          <p class="">Carl<br class="">

                          </p>

                          <div class="moz-cite-prefix">On 2019-08-19

                            4:15 p.m., Darrell Budic wrote:<br class="">

                          </div>

                          <blockquote type="cite"

cite="mid:WM!70b2a24c324753176289e0b250d790e7f5ffa931f81e9072ffcc23c4f6fc1a7199617ab90bbd5e0e5170f02e1339ca54!@filter4.lastspam.com"

                            class="">

                            <meta http-equiv="Content-Type"

                              content="text/html; charset=UTF-8"

                              class="">

                            /var/lib/glusterd/groups/virt is a good

                            start for ideas, notably some thread

                            settings and choose-local=off to improve

                            read performance. If you don’t have at least

                            10 cores on your servers, you may want to

                            lower the recommended shd-max-threads=8 to

                            no more than half your CPU cores to keep

                            healing from swamping out regular work.

                            <div class=""><br class="">

                            </div>

                            <div class="">It’s also starting to depend

                              on what your backing store and networking

                              setup are, so you’re going to want to test

                              changes and find what works best for your

                              setup.</div>

                            <div class=""><br class="">

                            </div>

                            <div class="">In addition to the virt group

                              settings, I use these on most of my

                              volumes, SSD or HDD backed, with the

                              default 64M shard size:</div>

                            <div class=""><br class="">

                            </div>

                            <div class="">

                              <div style="margin: 0px; font-stretch:

                                normal; line-height: normal;" class=""><span

                                  style="font-variant-ligatures:

                                  no-common-ligatures; background-color:

                                  rgba(255, 255, 255, 0);" class=""><a

                                    href="http://performance.io/"

                                    class="" moz-do-not-send="true">performance.io</a>-thread-count:

                                  32<span class="Apple-tab-span" style="white-space:pre">                </span>#

                                  seemed good for my system,

                                  particularly a ZFS backed volume with

                                  lots of spindles</span></div>

                              <div style="margin: 0px; font-stretch:

                                normal; line-height: normal;" class=""><span

                                  style="font-variant-ligatures:

                                  no-common-ligatures; background-color:

                                  rgba(255, 255, 255, 0);" class="">client.event-threads:

                                  8<span class="Apple-tab-span" style="white-space:pre">                                </span></span></div>

                            </div>

                            <div class=""><span

                                style="font-variant-ligatures:

                                no-common-ligatures" class="">

                                <div style="margin: 0px; font-stretch:

                                  normal; line-height: normal;" class=""><span

                                    style="font-variant-ligatures:

                                    no-common-ligatures;

                                    background-color: rgba(255, 255,

                                    255, 0);" class="">cluster.data-self-heal-algorithm:

                                    full<span class="Apple-tab-span" style="white-space:pre">        </span>#

                                    10G networking, uses more net/less

                                    cpu to heal. probably don’t use this

                                    for 1G networking?</span></div>

                                <div class=""><span

                                    style="font-variant-ligatures:

                                    no-common-ligatures" class="">

                                    <div style="margin: 0px;

                                      font-stretch: normal; line-height:

                                      normal;" class=""><span

                                        style="font-variant-ligatures:

                                        no-common-ligatures;

                                        background-color: rgba(255, 255,

                                        255, 0);" class="">performance.stat-prefetch:

                                        on</span></div>

                                    <div class=""><span

                                        style="font-variant-ligatures:

                                        no-common-ligatures" class="">

                                        <div style="margin: 0px;

                                          font-stretch: normal;

                                          line-height: normal;" class=""><span

style="font-variant-ligatures: no-common-ligatures; background-color:

                                            rgba(255, 255, 255, 0);"

                                            class="">cluster.read-hash-mode:

                                            3<span class="Apple-tab-span" style="white-space:pre">                        </span>#

                                            distribute reads to least

                                            loaded server (by read queue

                                            depth)</span></div>

                                        <div class=""><span

                                            style="font-variant-ligatures:

                                            no-common-ligatures;

                                            background-color: rgba(255,

                                            255, 255, 0);" class=""><br

                                              class="">

                                          </span></div>

                                        <div class=""><span

                                            style="font-variant-ligatures:

                                            no-common-ligatures;

                                            background-color: rgba(255,

                                            255, 255, 0);" class="">and

                                            these two only on my HDD

                                            backed volume:</span></div>

                                        <div class=""><span

                                            style="font-variant-ligatures:

                                            no-common-ligatures;

                                            background-color: rgba(255,

                                            255, 255, 0);" class=""><br

                                              class="">

                                          </span></div>

                                        <div class=""><span

                                            style="font-variant-ligatures:

                                            no-common-ligatures"

                                            class="">

                                            <div style="margin: 0px;

                                              font-stretch: normal;

                                              line-height: normal;"

                                              class=""><span

                                                style="font-variant-ligatures:

                                                no-common-ligatures;

                                                background-color:

                                                rgba(255, 255, 255, 0);"

                                                class="">performance.cache-size:

                                                1G</span></div>

                                            <div style="margin: 0px;

                                              font-stretch: normal;

                                              line-height: normal;"

                                              class=""><span

                                                style="font-variant-ligatures:

                                                no-common-ligatures;

                                                background-color:

                                                rgba(255, 255, 255, 0);"

                                                class="">performance.write-behind-window-size:

                                                64MB</span></div>

                                            <div class=""><span

                                                style="font-variant-ligatures:

                                                no-common-ligatures"

                                                class=""><br class="">

                                              </span></div>

                                            <div class=""><span

                                                style="font-variant-ligatures:

                                                no-common-ligatures"

                                                class="">but I suspect

                                                these two need another

                                                round or six of tuning

                                                to tell if they are

                                                making a difference.</span></div>

                                          </span></div>

                                      </span></div>

                                  </span></div>

                              </span></div>

                            <div class=""><br class="">

                            </div>

                            <div class="">I use the

                              throughput-performance tuned profile on my

                              servers, so you should be in good shape

                              there.</div>

                            <div class="">

                              <div class=""><br class="">

                                <blockquote type="cite" class="">

                                  <div class="">On Aug 19, 2019, at

                                    12:22 PM, Guy Boisvert &lt;<a

                                      href="mailto:guy.boisvert@ingtegration.com"

                                      class="" moz-do-not-send="true">guy.boisvert@ingtegration.com</a>&gt;

                                    wrote:</div>

                                  <br class="Apple-interchange-newline">

                                  <div class="">

                                    <div class="">On 2019-08-19 12:08

                                      p.m., Darrell Budic wrote:<br

                                        class="">

                                      <blockquote type="cite" class="">You

                                        also need to make sure your

                                        volume is setup properly for

                                        best performance. Did you apply

                                        the gluster virt group to your

                                        volumes, or at least

                                        features.shard = on on your VM

                                        volume?<br class="">

                                      </blockquote>

                                      <br class="">

                                      That's what we did here:<br

                                        class="">

                                      <br class="">

                                      <br class="">

                                      gluster volume set W2K16_Rhenium

                                      cluster.quorum-type auto<br

                                        class="">

                                      gluster volume set W2K16_Rhenium

                                      network.ping-timeout 10<br

                                        class="">

                                      gluster volume set W2K16_Rhenium

                                      auth.allow \*<br class="">

                                      gluster volume set W2K16_Rhenium

                                      group virt<br class="">

                                      gluster volume set W2K16_Rhenium

                                      storage.owner-uid 36<br class="">

                                      gluster volume set W2K16_Rhenium

                                      storage.owner-gid 36<br class="">

                                      gluster volume set W2K16_Rhenium

                                      features.shard on<br class="">

                                      gluster volume set W2K16_Rhenium

                                      features.shard-block-size 256MB<br

                                        class="">

                                      gluster volume set W2K16_Rhenium

                                      cluster.data-self-heal-algorithm

                                      full<br class="">

                                      gluster volume set W2K16_Rhenium

                                      performance.low-prio-threads 32<br

                                        class="">

                                      <br class="">

                                      tuned-adm profile random-io       

                                      (a profile i added in CentOS 7)<br

                                        class="">

                                      <br class="">

                                      <br class="">

                                      cat

                                      /usr/lib/tuned/random-io/tuned.conf<br

                                        class="">

===========================================<br class="">

                                      [main]<br class="">

                                      summary=Optimize for Gluster

                                      virtual machine storage<br

                                        class="">

                                      include=throughput-performance<br

                                        class="">

                                      <br class="">

                                      [sysctl]<br class="">

                                      <br class="">

                                      vm.dirty_ratio = 5<br class="">

                                      vm.dirty_background_ratio = 2<br

                                        class="">

                                      <br class="">

                                      <br class="">

                                      Any more optimization to add to

                                      this?<br class="">

                                      <br class="">

                                      <br class="">

                                      Guy<br class="">

                                      <br class="">

                                      -- <br class="">

                                      Guy Boisvert, ing.<br class="">

                                      IngTegration inc.<br class="">

                                      <a

                                        href="http://www.ingtegration.com/"

                                        class="" moz-do-not-send="true">http://www.ingtegration.com</a><br

                                        class="">

                                      <a class="moz-txt-link-freetext"

                                        href="https://www.linkedin.com/in/guy-boisvert-8990487"

                                        moz-do-not-send="true">https://www.linkedin.com/in/guy-boisvert-8990487</a><br

                                        class="">

                                      <br class="">

                                      AVIS DE CONFIDENTIALITE : ce

                                      message peut contenir des<br

                                        class="">

                                      renseignements confidentiels

                                      appartenant exclusivement a<br

                                        class="">

                                      IngTegration Inc. ou a ses

                                      filiales. Si vous n'etes pas<br

                                        class="">

                                      le destinataire indique ou prevu

                                      dans ce  message (ou<br class="">

                                      responsable de livrer ce message a

                                      la personne indiquee ou<br

                                        class="">

                                      prevue) ou si vous pensez que ce

                                      message vous a ete adresse<br

                                        class="">

                                      par erreur, vous ne pouvez pas

                                      utiliser ou reproduire ce<br

                                        class="">

                                      message, ni le livrer a quelqu'un

                                      d'autre. Dans ce cas, vous<br

                                        class="">

                                      devez le detruire et vous etes

                                      prie d'avertir l'expediteur<br

                                        class="">

                                      en repondant au courriel.<br

                                        class="">

                                      <br class="">

                                      CONFIDENTIALITY NOTICE :

                                      Proprietary/Confidential

                                      Information<br class="">

                                      belonging to IngTegration Inc. and

                                      its affiliates may be<br class="">

                                      contained in this message. If you

                                      are not a recipient<br class="">

                                      indicated or intended in this

                                      message (or responsible for<br

                                        class="">

                                      delivery of this message to such

                                      person), or you think for<br

                                        class="">

                                      any reason that this message may

                                      have been addressed to you<br

                                        class="">

                                      in error, you may not use or copy

                                      or deliver this message to<br

                                        class="">

                                      anyone else. In such case, you

                                      should destroy this message<br

                                        class="">

                                      and are asked to notify the sender

                                      by reply email.<br class="">

                                      <br class="">

                                    </div>

                                  </div>

                                </blockquote>

                              </div>

                              <br class="">

                            </div>

                          </blockquote>

                        </div>

                      </blockquote>

                      <blockquote type="cite" class="">

                        <div dir="ltr" class=""><span class="">_______________________________________________</span><br

                            class="">

                          <span class="">Gluster-users mailing list</span><br

                            class="">

                          <span class=""><a

                              href="mailto:Gluster-users@gluster.org"

                              moz-do-not-send="true" class="">Gluster-users@gluster.org</a></span><br

                            class="">

                          <span class=""><a

                              href="https://lists.gluster.org/mailman/listinfo/gluster-users"

                              moz-do-not-send="true" class="">https://lists.gluster.org/mailman/listinfo/gluster-users</a></span></div>

                      </blockquote>

                    </div>

                  </blockquote>

                </div>

              </div>

            </blockquote>

          </div>

          <br class="">

        </div>

      </div>

    </blockquote>

  </body>

</html>