[From nobody Wed Aug 28 06:43:10 2019
Received: from [192.222.158.19] (account csirotic@evoqarchitecture.com)
	by evoqarchitecture.com (CommuniGate Pro HTTP 6.2.12)
	with AIRSYNC id 290067; Fri, 23 Aug 2019 19:00:42 -0400
From: &quot;Carl Sirotic&quot; &lt;csirotic@evoqarchitecture.com&gt;
Date: Fri, 23 Aug 2019 19:00:40 -0400
Subject: Re: [Gluster-users] Brick Reboot =&gt; VMs slowdown, client crashes
Message-ID: &lt;b8922469-ae22-465f-8f8e-63f081423c88@email.android.com&gt;
X-Android-Message-ID: &lt;b8922469-ae22-465f-8f8e-63f081423c88@email.android.com&gt;
To: Joe Julian &lt;joe@julianfamily.org&gt;
Importance: Normal
X-Priority: 3
X-MSMail-Priority: Normal
MIME-Version: 1.0
Content-Type: text/html; charset=utf-8

&lt;div dir='auto'&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;I tried the script that is supposed to do so but didn't help.&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;Do you have a clean order or doing?&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;I'm all for it.&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;However,&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;wouldn't a failing node be the same as a bad shutdown?&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;I start to wonder if starting the vm through a native fuse mount would do this vs using the libgfapi in qemu.&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;Carl&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;div dir=&quot;auto&quot;&gt;&lt;br&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;gmail_extra&quot;&gt;&lt;br&gt;&lt;div class=&quot;gmail_quote&quot;&gt;On Aug. 23, 2019 6:53 p.m., Joe Julian &amp;lt;joe@julianfamily.org&amp;gt; wrote:&lt;br type=&quot;attribution&quot; /&gt;
Or fix your shutdown order such that the bricks are shut down before the network so they never have to wait for ping-timeout.

Ping-timeout only comes in to play when an open tcp connection stop responding. If the connection is closed &lt;http://www.tcpipguide.com/free/t_TCPConnectionTermination-2.htm&gt; then the client will never wait for it to respond.

On 8/23/19 3:06 PM, Carl Sirotic wrote:
&gt;
&gt; Okay,
&gt;
&gt; so it means, at least I am not getting the expected behavior and there is hope.
&gt;
&gt; I put the quorum settings that I was told a couple of emails ago.
&gt;
&gt; After applying virt group, they are
&gt;
&gt; cluster.quorum-type auto
&gt; cluster.quorum-count (null)
&gt; cluster.server-quorum-type server
&gt; cluster.server-quorum-ratio 0
&gt; cluster.quorum-reads no
&gt;
&gt; Also,
&gt;
&gt; I just put the ping timeout to 5 seconds now.
&gt;
&gt;
&gt; Carl
&gt;
&gt; On 2019-08-23 5:45 p.m., Ingo Fischer wrote:
&gt;&gt; Hi Carl,
&gt;&gt;
&gt;&gt; In my understanding and experience (I have a replica 3 System running too) this should not happen. Can you tell your client and server quorum settings?
&gt;&gt;
&gt;&gt; Ingo
&gt;&gt;
&gt;&gt; Am 23.08.2019 um 15:53 schrieb Carl Sirotic &lt;csirotic@evoqarchitecture.com &lt;mailto:csirotic@evoqarchitecture.com&gt;&gt;:
&gt;&gt;
&gt;&gt;&gt; However,
&gt;&gt;&gt;
&gt;&gt;&gt; I must have misunderstood the whole concept of gluster.
&gt;&gt;&gt;
&gt;&gt;&gt; In a replica 3, for me, it's completely unacceptable, regardless of the options, that all my VMs go down when I reboot one node.
&gt;&gt;&gt;
&gt;&gt;&gt; The whole purpose of having a full 3 copy of my data on the fly is suposed to be this.
&gt;&gt;&gt;
&gt;&gt;&gt; I am in the process of sharding every file.
&gt;&gt;&gt;
&gt;&gt;&gt; But even if the healing time would be longer, I would still expect a non-sharded replica 3 brick with vm boot disk, to not go down if I reboot one of its copy.
&gt;&gt;&gt;
&gt;&gt;&gt;
&gt;&gt;&gt; I am not very impressed by gluster so far.
&gt;&gt;&gt;
&gt;&gt;&gt; Carl
&gt;&gt;&gt;
&gt;&gt;&gt; On 2019-08-19 4:15 p.m., Darrell Budic wrote:
&gt;&gt;&gt;&gt; /var/lib/glusterd/groups/virt is a good start for ideas, notably some thread settings and choose-local=off to improve read performance. If you don’t have at least 10 cores on your servers, you may want to lower the recommended shd-max-threads=8 to no more than half your CPU cores to keep healing from swamping out regular work.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; It’s also starting to depend on what your backing store and networking setup are, so you’re going to want to test changes and find what works best for your setup.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; In addition to the virt group settings, I use these on most of my volumes, SSD or HDD backed, with the default 64M shard size:
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; performance.io &lt;http://performance.io&gt;-thread-count: 32# seemed good for my system, particularly a ZFS backed volume with lots of spindles
&gt;&gt;&gt;&gt; client.event-threads: 8
&gt;&gt;&gt;&gt; cluster.data-self-heal-algorithm: full# 10G networking, uses more net/less cpu to heal. probably don’t use this for 1G networking?
&gt;&gt;&gt;&gt; performance.stat-prefetch: on
&gt;&gt;&gt;&gt; cluster.read-hash-mode: 3# distribute reads to least loaded server (by read queue depth)
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; and these two only on my HDD backed volume:
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; performance.cache-size: 1G
&gt;&gt;&gt;&gt; performance.write-behind-window-size: 64MB
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; but I suspect these two need another round or six of tuning to tell if they are making a difference.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt; I use the throughput-performance tuned profile on my servers, so you should be in good shape there.
&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; On Aug 19, 2019, at 12:22 PM, Guy Boisvert &lt;guy.boisvert@ingtegration.com &lt;mailto:guy.boisvert@ingtegration.com&gt;&gt; wrote:
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; On 2019-08-19 12:08 p.m., Darrell Budic wrote:
&gt;&gt;&gt;&gt;&gt;&gt; You also need to make sure your volume is setup properly for best performance. Did you apply the gluster virt group to your volumes, or at least features.shard = on on your VM volume?
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; That's what we did here:
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium cluster.quorum-type auto
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium network.ping-timeout 10
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium auth.allow \*
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium group virt
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium storage.owner-uid 36
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium storage.owner-gid 36
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium features.shard on
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium features.shard-block-size 256MB
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium cluster.data-self-heal-algorithm full
&gt;&gt;&gt;&gt;&gt; gluster volume set W2K16_Rhenium performance.low-prio-threads 32
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; tuned-adm profile random-io        (a profile i added in CentOS 7)
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; cat /usr/lib/tuned/random-io/tuned.conf
&gt;&gt;&gt;&gt;&gt; ===========================================
&gt;&gt;&gt;&gt;&gt; [main]
&gt;&gt;&gt;&gt;&gt; summary=Optimize for Gluster virtual machine storage
&gt;&gt;&gt;&gt;&gt; include=throughput-performance
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; [sysctl]
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; vm.dirty_ratio = 5
&gt;&gt;&gt;&gt;&gt; vm.dirty_background_ratio = 2
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; Any more optimization to add to this?
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; Guy
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; -- 
&gt;&gt;&gt;&gt;&gt; Guy Boisvert, ing.
&gt;&gt;&gt;&gt;&gt; IngTegration inc.
&gt;&gt;&gt;&gt;&gt; http://www.ingtegration.com
&gt;&gt;&gt;&gt;&gt; https://www.linkedin.com/in/guy-boisvert-8990487
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; AVIS DE CONFIDENTIALITE : ce message peut contenir des
&gt;&gt;&gt;&gt;&gt; renseignements confidentiels appartenant exclusivement a
&gt;&gt;&gt;&gt;&gt; IngTegration Inc. ou a ses filiales. Si vous n'etes pas
&gt;&gt;&gt;&gt;&gt; le destinataire indique ou prevu dans ce  message (ou
&gt;&gt;&gt;&gt;&gt; responsable de livrer ce message a la personne indiquee ou
&gt;&gt;&gt;&gt;&gt; prevue) ou si vous pensez que ce message vous a ete adresse
&gt;&gt;&gt;&gt;&gt; par erreur, vous ne pouvez pas utiliser ou reproduire ce
&gt;&gt;&gt;&gt;&gt; message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous
&gt;&gt;&gt;&gt;&gt; devez le detruire et vous etes prie d'avertir l'expediteur
&gt;&gt;&gt;&gt;&gt; en repondant au courriel.
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;&gt; CONFIDENTIALITY NOTICE : Proprietary/Confidential Information
&gt;&gt;&gt;&gt;&gt; belonging to IngTegration Inc. and its affiliates may be
&gt;&gt;&gt;&gt;&gt; contained in this message. If you are not a recipient
&gt;&gt;&gt;&gt;&gt; indicated or intended in this message (or responsible for
&gt;&gt;&gt;&gt;&gt; delivery of this message to such person), or you think for
&gt;&gt;&gt;&gt;&gt; any reason that this message may have been addressed to you
&gt;&gt;&gt;&gt;&gt; in error, you may not use or copy or deliver this message to
&gt;&gt;&gt;&gt;&gt; anyone else. In such case, you should destroy this message
&gt;&gt;&gt;&gt;&gt; and are asked to notify the sender by reply email.
&gt;&gt;&gt;&gt;&gt;
&gt;&gt;&gt;&gt;
&gt;&gt;&gt; _______________________________________________
&gt;&gt;&gt; Gluster-users mailing list
&gt;&gt;&gt; Gluster-users@gluster.org &lt;mailto:Gluster-users@gluster.org&gt;
&gt;&gt;&gt; https://lists.gluster.org/mailman/listinfo/gluster-users
&gt;
&gt; _______________________________________________
&gt; Gluster-users mailing list
&gt; Gluster-users@gluster.org
&gt; https://lists.gluster.org/mailman/listinfo/gluster-users

AVIS DE CONFIDENTIALITÉ : Ce courriel peut contenir de l'information privilégiée et confidentielle. Nous vous demandons de le détruire immédiatement si vous n'êtes pas le destinataire.
CONFIDENTIALITY NOTICE: This email may contain information that is privileged and confidential. Please delete immediately if you are not the intended recipient.
]