<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Patrick-<div class=""><br class=""></div><div class="">What did you upgrade to? I’m probably missing something, but there wasn’t really a 3.13 version, and it isn’t listed on&nbsp;<a href="https://www.gluster.org/release-schedule/" class="">https://www.gluster.org/release-schedule/</a></div><div class=""><br class=""></div><div class="">Sorry about the confusion between dispersed and distribute-replicate, you’re absolutely correct that you need the normal shd max-threads settings there.</div><div class=""><br class=""></div><div class="">Any improvements over time?</div><div class=""><br class=""></div><div class="">Did you make sure each client or VM host can see all the servers? I’ve had an issue where a client was only talking to one of the servers, so it forced the servers to heal everything all the time, had a big performance impact. Probably don’t apply to an NFS mount, but may to your fuse mounts. Along those lines, any errors on the switches connecting the servers to the clients? Could explain why one is slow and the other isn’t so slow if one’s erroring a lot on the net.</div><div class=""><br class=""></div><div class="">&nbsp; -Darrell<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 23, 2019, at 5:07 AM, Patrick Rennie &lt;<a href="mailto:patrickmrennie@gmail.com" class="">patrickmrennie@gmail.com</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">Hi Darrel,&nbsp;<div class=""><br class=""></div><div class="">Thanks again for your advice, I tried to take yesterday off and just not think about it, back at it again today. Still no real progress, however my colleague upgraded our version to 3.13 yesterday, this has broken NFS and caused some other issues for us now. It did add the 'gluster volume heal &lt;vol&gt; info summary' so I can use that to try and keep an eye on how many files do seem to need healing, if it's accurate it's possibly less than I though.&nbsp;</div><div class=""><br class=""></div><div class="">We are in the progress of moving this data to new storage, but it does take a long time to move so much data around, and more keeps coming in each day.&nbsp;</div><div class=""><br class=""></div><div class="">We do have 3 cache SSDs for each brick so generally performance on the bricks themselves is quite quick, I can DD a 10GB file at ~1.7-2GB/s directly on a brick so I think the performance of each brick is actually ok.&nbsp;</div><div class=""><br class=""></div><div class="">It's a distribute/replicate volume, not dispearsed so I can't change disperse.shd-max-threads.&nbsp;</div><div class=""><br class=""></div><div class="">I have checked the basics like all peers connected and no scrubs in progress etc.&nbsp;&nbsp;<br class=""></div><div class=""><br class=""></div><div class="">Will keep working away at this, and will start to read through some of your performance tuning suggestions. Really appreciate your advice.&nbsp;</div><div class=""><br class=""></div><div class="">Cheers,</div><div class=""><br class=""></div><div class="">-Patrick</div><div class=""><br class=""></div><div class=""><br class=""></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 22, 2019 at 12:43 AM Darrell Budic &lt;<a href="mailto:budic@onholyground.com" class="">budic@onholyground.com</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class="">Patrick-<div class=""><br class=""></div><div class="">Specifically re:</div><blockquote type="cite" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class="">Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this.&nbsp;</div><div class="">...</div><div class="">I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal.&nbsp;</div></div></div></div></div></div></blockquote></div></blockquote></div></blockquote><div class=""><br class=""></div><div class="">You’re in a bind, I know, but it’s just going to take some time recover. You have a lot of data, and even at the best speeds your disks and networks can muster, it’s going to take a while. Until your cluster is fully healed, anything else you try may not have the full effect it would on a fully operational cluster. Your predecessor may have made things worse by not having proper posix attributes on the ZFS file system. You may have made things worse by killing brick processes in your distributed-replicated setup, creating an additional need for healing and possibly compounding the overall performance issues. I’m not trying to blame you or make you feel bad, but I do want to point out that there’s a problem here, and there is unlikely to be a silver bullet that will resolve the issue instantly. You’re going to have to give it time to get back into a “normal" condition, which seems to be what your setup was configured and tested for in the first place.</div><div class=""><br class=""></div><div class="">Those things said, rather than trying to move things from this cluster to different storage, what about having your VMs mount different storage in the first place and move the write load off of this cluster while it recovers?</div><div class=""><br class=""></div><div class="">Looking at the profile you posted for Strahil, your bricks are spending a lot of time doing LOOKUPs, and some are slower than others by a significant margin. If you haven’t already, check the zfs pools on those, make sure they don’t have any failed disks that might be slowing them down. Consider if you can speed them up with a ZIL or SLOG if they are spinning disks (although your previous server descriptions sound like you don’t need a SLOG, ZILs may help fi they are HDDs)? Just saw your additional comments that one server is faster than than the other, it’s possible that it’s got the actual data and the other one is doing healings every time it gets accessed, or it’s just got fuller and slower volumes. It may make sense to try forcing all your VM mounts to the faster server for a while, even if it’s the one with higher load (serving will get preference to healing, but don’t push the shd-max-threads too high, they can squash performance. Given it’s a dispersed volume, make sure you’ve got disperse.shd-max-threads at 4 or 8, and raise disperse.shd-wait-qlength to 4096 or so.</div><div class=""><br class=""></div><div class="">You’re getting into things best tested with everything working, but desperate times call for accelerated testing, right?</div><div class=""><br class=""></div><div class="">You could experiment with different values of <a href="http://performance.io/" target="_blank" class="">performance.io</a>-thread-cound, try 48. But if your CPU load is already near max, you’re getting everything you can out of your CPU already, so don’t spend too much time on it.</div><div class=""><br class=""></div><div class="">Check out&nbsp;<a href="https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache" target="_blank" class="">https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache</a>&nbsp;and try applying these to your gluster volume. Without knowing more about your workload, these may help if you’re doing a lot of directory listing and file lookups or tests for the (non)existence of a file from your VMs. If those help, search the mailing list for info on the mount option ’negative_cache=1’ and a thread titled '<span style="font-family: &quot;Helvetica Neue&quot;;" class="">[Gluster-users] Gluster native mount is really slow compared to nfs</span><span style="font-family: &quot;Helvetica Neue&quot;;" class="">’</span><span style="background-color:rgba(255,255,255,0)" class="">, it may have some client side mount options that could give you further&nbsp;<span class="">benefits</span>.</span></div><div class=""><font face="Helvetica Neue" class=""><span style="" class=""><br class=""></span></font></div><div class=""><span style="background-color:rgba(255,255,255,0)" class="">Have a look at&nbsp;<a href="https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#tuning-options" target="_blank" class="">https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#tuning-options</a><span class="">, cluster.data-sef-heal-algorithm full may help things heal faster for you. performance.flush-behind &amp; related may improve write response to the clients, use caution unless you&nbsp;have UPSs &amp; battery backed raids, etc. If you have&nbsp;stats on network traffic on/between your two&nbsp;“real” node servers, you can use that as a proxy value for healing performance.</span></span></div><div class=""><span style="background-color:rgba(255,255,255,0)" class=""><br class=""></span></div><div class=""><span style="background-color:rgba(255,255,255,0)" class="">I looked up the performance.stat-prefetch bug for you, it was fixed back in 3.8, so it should be safe to enable on your 3.12.x system even with servers at .15 &amp; .14.</span></div><div class=""><span style="background-color:rgba(255,255,255,0)" class=""><br class=""></span></div><div class=""><span style="background-color:rgba(255,255,255,0)" class="">You<span class="">’</span>ll probably have to&nbsp;<span class="">wait</span>&nbsp;for devs to get anything else out of&nbsp;<span class="">those</span>&nbsp;logs, but make sure your servers can all see&nbsp;<span class="">each</span>&nbsp;other (gluster peer status, everything should be&nbsp;<span class="">“</span>Peer in Cluster (Connected)<span class="">”</span>&nbsp;on all servers), and all 3 see all the bricks in the&nbsp;<span class="">‘</span>gluster vol status<span class="">’</span>.&nbsp; Maybe check for split brain files on those you keep seeing in the logs?&nbsp;</span></div><div class=""><span style="background-color:rgba(255,255,255,0)" class=""><br class=""></span></div><div class=""><span style="background-color:rgba(255,255,255,0)" class=""><span class="">Good</span>&nbsp;<span class="">luck</span>, have patience, and&nbsp;<span class="">remember</span>&nbsp;(&amp;&nbsp;<span class="">remind&nbsp;others) that things are not in their normal state at this moment, and look for things outside of the gluster server cluster to try to help (</span></span><span class=""><a href="https://joejulian.name/post/optimizing-web-performance-with-glusterfs/" target="_blank" class="">https://joejulian.name/post/optimizing-web-performance-with-glusterfs/</a></span><span style="background-color:rgba(255,255,255,0)" class="">) get&nbsp;through the healing as well.</span></div><div class=""><font face="Helvetica Neue" class=""><span style="" class=""><br class=""></span></font></div><div class=""><font face="Helvetica Neue" class=""><span style="" class="">&nbsp; &nbsp;-Darrell</span></font></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Apr 21, 2019, at 4:41 AM, Patrick Rennie &lt;<a href="mailto:patrickmrennie@gmail.com" target="_blank" class="">patrickmrennie@gmail.com</a>&gt; wrote:</div><br class="gmail-m_3603348053678985885Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class="">Another small update from me, I have been keeping an eye on the glustershd.log file to see what is going on and I keep seeing the same file names come up in there every 10 minutes, but not a lot of other activity. Logs below.&nbsp;<div class="">How can I be sure my heal is progressing through the files which actually need to be healed? I thought it would show up in these logs.&nbsp;</div><div class="">I also increased the "cluster.shd-max-threads" from 4 to 8 to try and speed things up too.&nbsp;</div><div class=""><br class=""></div><div class="">Any ideas here?&nbsp;</div><div class=""><br class=""></div><div class="">Thanks,</div><div class=""><br class=""></div><div class="">- Patrick</div><div class=""><br class=""></div><div class=""><div class="">On 01-B</div><div class="">-------</div><div class="">[2019-04-21 09:12:54.575689] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904</div><div class="">[2019-04-21 09:12:54.733601] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:13:12.028509] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe</div><div class="">[2019-04-21 09:13:12.047470] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17</div><div class=""><br class=""></div><div class="">[2019-04-21 09:23:13.044377] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe</div><div class="">[2019-04-21 09:23:13.051479] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17</div><div class=""><br class=""></div><div class="">[2019-04-21 09:33:07.400369] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:33:11.825449] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa</div><div class="">[2019-04-21 09:33:14.029837] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe</div><div class="">[2019-04-21 09:33:14.037436] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17</div><div class="">[2019-04-21 09:33:23.913882] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:33:43.874201] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1</div><div class="">[2019-04-21 09:34:02.273898] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:35:12.282045] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:35:15.146252] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885</div><div class="">[2019-04-21 09:35:15.254538] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:35:22.900803] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:35:27.150963] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45</div><div class="">[2019-04-21 09:35:29.186295] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:35:35.967451] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:35:40.733444] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9</div><div class="">[2019-04-21 09:35:58.707593] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:36:25.554260] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2&nbsp; sinks=1</div><div class="">[2019-04-21 09:36:26.031422] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d</div><div class="">[2019-04-21 09:36:26.083982] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2&nbsp; sinks=1</div><div class=""><br class=""></div><div class="">On 02-B</div><div class="">-------</div><div class="">[2019-04-21 09:03:15.815250] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01</div><div class="">[2019-04-21 09:03:15.863153] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:03:15.867432] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f</div><div class="">[2019-04-21 09:03:15.875134] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:03:39.020198] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe</div><div class="">[2019-04-21 09:03:39.027345] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17</div><div class=""><br class=""></div><div class="">[2019-04-21 09:13:18.524874] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01</div><div class="">[2019-04-21 09:13:20.070172] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:13:20.074977] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f</div><div class="">[2019-04-21 09:13:20.080827] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:13:40.015763] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe</div><div class="">[2019-04-21 09:13:40.021805] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17</div><div class=""><br class=""></div><div class="">[2019-04-21 09:23:21.991032] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01</div><div class="">[2019-04-21 09:23:22.054565] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:23:22.059225] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f</div><div class="">[2019-04-21 09:23:22.066266] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:23:41.129962] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe</div><div class="">[2019-04-21 09:23:41.135919] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17</div><div class=""><br class=""></div><div class="">[2019-04-21 09:33:24.015223] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01</div><div class="">[2019-04-21 09:33:24.069686] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:33:24.074341] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f</div><div class="">[2019-04-21 09:33:24.080065] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14</div><div class="">[2019-04-21 09:33:42.099515] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe</div><div class="">[2019-04-21 09:33:42.107481] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17</div></div><div class=""><br class=""></div></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie &lt;<a href="mailto:patrickmrennie@gmail.com" target="_blank" class="">patrickmrennie@gmail.com</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class="">Just another small update, I'm continuing to watch my brick logs and I just saw these errors come up in the recent events too. I am going to continue to post any errors I see in the hope of finding the right one to try and fix..&nbsp;<div class="">This is from the logs on brick1, seems to be occurring on both nodes on brick1, although at different times. I'm not sure what this means, can anyone shed any light?&nbsp;</div><div class="">I guess I am looking for some kind of specific error which may indicate something is broken or stuck and locking up and causing the extreme latency I'm seeing in the cluster.&nbsp;<br class=""></div><div class=""><br class=""></div><div class=""><div class="">[2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) [0x7f3b3e93158a] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) [0x7f3b3e4c5d45] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div><div class="">[2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed</div></div><div class=""><br class=""></div><div class="">Thanks again,<br class=""></div><div class=""><br class=""></div><div class="">-Patrick</div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie &lt;<a href="mailto:patrickmrennie@gmail.com" target="_blank" class="">patrickmrennie@gmail.com</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class="">Hi Darrell,&nbsp;<div class=""><br class=""></div><div class="">Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this.&nbsp;</div><div class=""><br class=""></div><div class="">I've checked cluster.op-version and cluster.max-op-version and it looks like I'm on the latest version there.&nbsp;</div><div class=""><br class=""></div><div class="">I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal.&nbsp;</div><div class=""><br class=""></div><div class="">Can anyone think of anything else I can try in the meantime to work out what's causing the extreme latency?&nbsp;</div><div class=""><br class=""></div><div class="">I've been going through cluster client the logs of some of our VMs and on some of our FTP servers I found this in the cluster mount log, but I am not seeing it on any of our other servers, just our FTP servers.&nbsp;</div><div class=""><div class=""><div class=""><br class=""></div></div></div><div class=""><div class="">[2019-04-21 07:16:19.925388] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null</div><div class="">[2019-04-21 07:19:43.413834] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote operation failed [No such file or directory]</div><div class="">[2019-04-21 07:19:43.414153] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote operation failed [No such file or directory]</div><div class="">[2019-04-21 07:23:33.154717] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null</div><div class="">[2019-04-21 07:33:24.943913] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null</div></div><div class=""><br class=""></div><div class="">Any ideas what this could mean? I am basically just grasping at straws here.</div><div class=""><br class=""></div><div class="">I am going to hold off on the version upgrade until I know there are no files which need healing, which could be a while, from some reading I've done there shouldn't be any issues with this as both are on v3.12.x&nbsp;</div><div class=""><br class=""></div><div class="">I've free'd up a small amount of space, but I still need to work on this further.&nbsp;</div><div class=""><br class=""></div><div class="">I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" which could be run on each brick and it would potentially clean up any files which were deleted straight from the bricks, but not via the client, I have a feeling this could help me free up about 5-10TB per brick from what I've been told about the history of this cluster. Can anyone confirm if this is actually safe to run?&nbsp;</div><div class=""><br class=""></div><div class="">At this stage, I'm open to any suggestions as to how to proceed, thanks again for any advice.&nbsp;</div><div class=""><br class=""></div><div class="">Cheers,&nbsp;</div><div class=""><br class=""></div><div class="">- Patrick</div></div></div></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic &lt;<a href="mailto:budic@onholyground.com" target="_blank" class="">budic@onholyground.com</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Patrick,<div class=""><br class=""></div><div class="">Sounds like progress. Be aware that gluster is expected to max out the CPUs on at least one of your servers while healing. This is normal and won’t adversely affect overall performance (any more than having bricks in need of healing, at any rate) unless you’re overdoing it. shd threads &lt;= 4 should not do that on your hardware. Other tunings may have also increased overall performance, so you may see higher CPU than previously anyway. I’d recommend upping those thread counts and letting it heal as fast as possible, especially if these are dedicated Gluster storage servers (Ie: not also running VMs, etc).  You should see “normal” CPU use one heals are completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 cores). It’s also likely to be different between your servers, in a pure replica, one tends to max and one tends to be a little higher, in a distributed-replica, I’d expect more than one to run harder while healing.</div><div class=""><br class=""></div><div class="">Keep the differences between doing an ls on a brick and doing an ls on a gluster mount in mind. When you do a ls on a gluster volume, it isn’t just doing a ls on one brick, it’s effectively doing it on ALL of your bricks, and they all have to return data before the ls succeeds. In a distributed volume, it’s figuring out where on each volume things live and getting the stat() from each to assemble the whole thing. And if things are in need of healing, it will take even longer to decide which version is current and use it (shd triggers a heal anytime it encounters this). Any of these things being slow slows down the overall response.&nbsp;</div><div class=""><br class=""></div><div class="">At this point, I’d get some sleep too, and let your cluster heal while you do. I’d really want it fully healed before I did any updates anyway, so let it use CPU and get itself sorted out. Expect it to do a round of healing after you upgrade each machine too, this is normal so don’t let the CPU spike surprise you, It’s just catching up from the downtime incurred by the update and/or reboot if you did one.</div><div class=""><br class=""></div><div class="">That reminds me, check your gluster cluster.op-version and cluster.max-op-version <span style="background-color:rgba(255,255,255,0)" class="">(gluster vol get all all | grep op-version)</span>. If op-version isn’t at the max-op-verison, set it to it so you’re taking advantage of the latest features available to your version.</div><div class=""><div class=""><br class=""></div><div class="">&nbsp; -Darrell</div><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Apr 20, 2019, at 11:54 AM, Patrick Rennie &lt;<a href="mailto:patrickmrennie@gmail.com" target="_blank" class="">patrickmrennie@gmail.com</a>&gt; wrote:</div><br class="gmail-m_3603348053678985885gmail-m_163229844221691320gmail-m_-4427179427950227310gmail-m_-1219620317814815363Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Darrell,&nbsp;<div class=""><br class=""></div><div class="">Thanks again for your advice, I've applied the acltype=posixacl on my zpools and I think that has reduced some of the noise from my brick logs.&nbsp;</div><div class="">I also bumped up some of the thread counts you suggested but my CPU load skyrocketed, so I dropped it back down to something slightly lower, but still higher than it was before, and will see how that goes for a while.&nbsp;</div><div class=""><br class=""></div><div class="">Although low space is a definite issue, if I run an ls anywhere on my bricks directly it's instant, &lt;1 second, and still takes several minutes via gluster, so there is still a problem in my gluster configuration somewhere. We don't have any snapshots, but I am trying to work out if any data on there is safe to delete, or if there is any way I can safely find and delete data which has been removed directly from the bricks in the past. I also have lz4 compression already enabled on each zpool which does help a bit, we get between 1.05 and 1.08x compression on this data.&nbsp;</div><div class="">I've tried to go through each client and checked it's cluster mount logs and also my brick logs and looking for errors, so far nothing is jumping out at me, but there are some warnings and errors here and there, I am trying to work out what they mean.&nbsp;</div><div class=""><br class=""></div><div class="">It's already 1 am here and unfortunately, I'm still awake working on this issue, but I think that I will have to leave the version upgrades until tomorrow.&nbsp;</div><div class=""><br class=""></div><div class="">Thanks again for your advice so far. If anyone has any ideas on where I can look for errors other than brick logs or the cluster mount logs to help resolve this issue, it would be much appreciated.&nbsp;</div><div class=""><br class=""></div><div class="">Cheers,</div><div class=""><br class=""></div><div class="">- Patrick</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic &lt;<a href="mailto:budic@onholyground.com" target="_blank" class="">budic@onholyground.com</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">See inline:<br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Apr 20, 2019, at 10:09 AM, Patrick Rennie &lt;<a href="mailto:patrickmrennie@gmail.com" target="_blank" class="">patrickmrennie@gmail.com</a>&gt; wrote:</div><br class="gmail-m_3603348053678985885gmail-m_163229844221691320gmail-m_-4427179427950227310gmail-m_-1219620317814815363gmail-m_-3666594629493743861Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">Hi Darrell,&nbsp;<div class=""><br class=""></div><div class="">Thanks for your reply, this issue seems to be getting worse over the last few days, really has me tearing my hair out. I will do as you have suggested and get started on upgrading from 3.12.14 to 3.12.15.&nbsp;</div><div class="">I've checked the zfs properties and all bricks have "xattr=sa" set, but none of them has "acltype=posixacl" set, currently the acltype property shows "off", if I make these changes will it apply retroactively to the existing data? I'm unfamiliar with what this will change so I may need to look into that before I proceed.&nbsp;</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">It is safe to apply that now, any new set/get calls will then use it if new posixacls exist, and use older if not. ZFS is good that way. It should clear up your posix_acl and posix errors over time.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class="">I understand performance is going to slow down as the bricks get full, I am currently trying to free space and migrate data to some newer storage, I have fresh several hundred TB storage I just setup recently but with these performance issues it's really slow. I also believe there is significant data which has been deleted directly from the bricks in the past, so if I can reclaim this space in a safe manner then I will have at least around 10-15% free space.&nbsp;</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">Full ZFS volumes will have a much larger impact on performance than you’d think, I’d prioritize this. If you have been taking zfs snapshots, consider deleting them to get the overall volume free space back up. And just to be sure it’s been said, delete from within the mounted volumes, don’t delete directly from the bricks (gluster will just try and heal it later, compounding your issues). Does not apply to deleting other data from the ZFS volume if it’s not part of the brick directory, of course.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class="">These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so generally they have plenty of resources available, currently only using around 330/512GB of memory.</div><div class=""><br class=""></div><div class="">I will look into what your suggested settings will change, and then will probably go ahead with your recommendations, for our specs as stated above, what would you suggest for <a href="http://performance.io/" target="_blank" class="">performance.io</a>-thread-count ?</div></div></div></div></blockquote><div class=""><br class=""></div>I run single 2630v4s on my servers, which have a smaller storage footprint than yours. I’d go with 32 for <a href="http://performance.io/" target="_blank" class="">performance.io</a>-thread-count. I’d try 4 for the shd thread settings on that gear. Your memory use sounds fine, so no worries there.<br class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class="">Our workload is nothing too extreme, we have a few VMs which write backup data to this storage nightly for our clients, our VMs don't live on this cluster, but just write to it.&nbsp;</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">If they are writing compressible data, you’ll get immediate benefit by setting compression=lz4 on your ZFS volumes. It won’t help any old data, of course, but it will compress new data going forward. This is another one that’s safe to enable on the fly.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class="">I've been going through all of the logs I can, below are some slightly sanitized errors I've come across, but I'm not sure what to make of them. The main error I am seeing is the first one below, across several of my bricks, but possibly only for specific folders on the cluster, I'm not 100% about that yet though.&nbsp;</div><div class=""><br class=""></div><div class=""><div class="">[2019-04-20 05:56:59.512649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default&nbsp; [Operation not supported]</div><div class="">[2019-04-20 05:59:06.084333] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default&nbsp; [Operation not supported]</div><div class="">[2019-04-20 05:59:43.289030] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default&nbsp; [Operation not supported]</div><div class="">[2019-04-20 05:59:50.582257] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default&nbsp; [Operation not supported]</div><div class="">[2019-04-20 06:01:42.501701] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default&nbsp; [Operation not supported]</div><div class="">[2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag)</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">[2019-04-20 13:12:36.131856] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument]</div><div class="">[2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf-&gt;ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available]</div><div class="">[2019-04-20 13:12:36.132016] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available]</div><div class="">[2019-04-20 13:12:38.093719] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available]</div><div class="">[2019-04-20 13:12:38.093660] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument]</div><div class="">[2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf-&gt;ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No data available]</div><div class=""><br class=""></div></div></div></div></div></blockquote><div class=""><br class=""></div><div class="">posixacls should clear those up, as mentioned.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class=""><div class=""><br class=""></div><div class="">[2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] 0-gvAA01-locks:&nbsp; Matching lock not found for unlock 0-9223372036854775807, by 980fdbbd367f0000 on 0x7fc4f0161440</div><div class="">[2019-04-20 14:25:59.654668] E [MSGID: 115053] [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client: cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, error-xlator: gvAA01-locks [Invalid argument]</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">[2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server)</div><div class="">[2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] (--&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) [0x7ff4ae6f796a] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) [0x7ff4ae2a96e8] --&gt;/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) [0x7ff4ae28528d] ) 0-: Reply submission failed</div><div class=""><br class=""></div></div></div></div></div></blockquote><div class=""><br class=""></div><div class="">Fix the posix acls and see if these clear up over time as well, I’m unclear on what the overall effect of running without the posix acls will be to total gluster health. Your biggest problem sounds like you need to free up space on the volumes and get the overall volume health back up to par and see if that doesn’t resolve the symptoms you’re seeing.</div><div class=""><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class=""><div class=""><br class=""></div><div class="">Thank you again for your assistance. It is greatly appreciated.&nbsp;</div><div class=""><br class=""></div><div class="">- Patrick</div><div class=""><br class=""></div><div class=""><br class=""></div></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic &lt;<a href="mailto:budic@onholyground.com" target="_blank" class="">budic@onholyground.com</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Patrick,<div class=""><br class=""></div><div class="">I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have “xattr=sa” and “acltype=posixacl” set on your ZFS volumes.</div><div class=""><br class=""></div><div class="">You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you’re in that realm.&nbsp;</div><div class=""><br class=""></div><div class="">How’s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I’ve encountered situations where other things won’t try and take ram back properly if they think it’s in use, so ZFS never gets the opportunity to give it up.</div><div class=""><br class=""></div><div class="">Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I’d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don’t know if it matters, but I’d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting <a href="http://performance.io/" target="_blank" class="">performance.io</a>-thread-count to 32 if those have beefy CPUs.</div><div class=""><br class=""></div><div class="">Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you’re going to have to do more digging into brick logs looking for errors and/or warnings to see what’s going on.</div><div class=""><br class=""></div><div class="">&nbsp; -Darrell</div><div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Apr 20, 2019, at 8:22 AM, Patrick Rennie &lt;<a href="mailto:patrickmrennie@gmail.com" target="_blank" class="">patrickmrennie@gmail.com</a>&gt; wrote:</div><br class="gmail-m_3603348053678985885gmail-m_163229844221691320gmail-m_-4427179427950227310gmail-m_-1219620317814815363gmail-m_-3666594629493743861gmail-m_-8629894811503422375Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hello Gluster Users,&nbsp;<div class=""><br class=""></div><div class="">I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be.&nbsp;</div><div class=""><br class=""></div><div class="">I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer.&nbsp;</div><div class=""><br class=""></div><div class="">- Patrick</div><div class=""><br class=""></div><div class="">This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too:</div><div class="">[2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/&lt;filename&gt; library: system.posix_acl_default&nbsp; [Operation not supported]<br class=""></div><div class=""><div class="">[2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag)<br class=""></div></div><div class=""><br class=""></div><div class="">Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB.&nbsp;</div><div class="">We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config.&nbsp;</div><div class="">Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s.&nbsp;</div><div class=""><div class=""><br class=""></div><div class=""># dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000</div><div class="">10000+0 records in</div><div class="">10000+0 records out</div><div class="">10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s</div></div><div class=""><br class=""></div><div class="">Node 1:<br class=""></div><div class=""><div class=""># glusterfs --version</div><div class="">glusterfs 3.12.15</div></div><div class=""><br class=""></div><div class="">Node 2:</div><div class=""><div class=""># glusterfs --version</div><div class="">glusterfs 3.12.14</div></div><div class=""><br class=""></div><div class="">Arbiter:</div><div class=""><div class=""># glusterfs --version</div><div class="">glusterfs 3.12.14</div></div><div class=""><br class=""></div><div class="">Here is our gluster volume status:</div><div class=""><br class=""></div><div class=""><div class=""># gluster volume status</div><div class="">Status of volume: gvAA01</div><div class="">Gluster process&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;TCP Port&nbsp; RDMA Port&nbsp; Online&nbsp; Pid</div><div class="">------------------------------------------------------------------------------</div><div class="">Brick 01-B:/brick1/gvAA01/brick&nbsp; &nbsp; 49152&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7219</div><div class="">Brick 02-B:/brick1/gvAA01/brick&nbsp; &nbsp; 49152&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;21845</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49152&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6931</div><div class="">Brick 01-B:/brick2/gvAA01/brick&nbsp; &nbsp; 49153&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7239</div><div class="">Brick 02-B:/brick2/gvAA01/brick&nbsp; &nbsp; 49153&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9916</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49153&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6939</div><div class="">Brick 01-B:/brick3/gvAA01/brick&nbsp; &nbsp; 49154&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7235</div><div class="">Brick 02-B:/brick3/gvAA01/brick&nbsp; &nbsp; 49154&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;21858</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49154&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6947</div><div class="">Brick 01-B:/brick4/gvAA01/brick&nbsp; &nbsp; 49155&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;31840</div><div class="">Brick 02-B:/brick4/gvAA01/brick&nbsp; &nbsp; 49155&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9933</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49155&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6956</div><div class="">Brick 01-B:/brick5/gvAA01/brick&nbsp; &nbsp; 49156&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7233</div><div class="">Brick 02-B:/brick5/gvAA01/brick&nbsp; &nbsp; 49156&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9942</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49156&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6964</div><div class="">Brick 01-B:/brick6/gvAA01/brick&nbsp; &nbsp; 49157&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7234</div><div class="">Brick 02-B:/brick6/gvAA01/brick&nbsp; &nbsp; 49157&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9952</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck6&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49157&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6974</div><div class="">Brick 01-B:/brick7/gvAA01/brick&nbsp; &nbsp; 49158&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7248</div><div class="">Brick 02-B:/brick7/gvAA01/brick&nbsp; &nbsp; 49158&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9960</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck7&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49158&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6984</div><div class="">Brick 01-B:/brick8/gvAA01/brick&nbsp; &nbsp; 49159&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7253</div><div class="">Brick 02-B:/brick8/gvAA01/brick&nbsp; &nbsp; 49159&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9970</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck8&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49159&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;6993</div><div class="">Brick 01-B:/brick9/gvAA01/brick&nbsp; &nbsp; 49160&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7245</div><div class="">Brick 02-B:/brick9/gvAA01/brick&nbsp; &nbsp; 49160&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9984</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck9&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;49160&nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;7001</div><div class="">NFS Server on localhost&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2049&nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;17276</div><div class="">Self-heal Daemon on localhost&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;25245</div><div class="">NFS Server on 02-B&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2049&nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;9089</div><div class="">Self-heal Daemon on 02-B&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;17838</div><div class="">NFS Server on 00-a&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2049&nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;15660</div><div class="">Self-heal Daemon on 00-a&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp; Y&nbsp; &nbsp; &nbsp; &nbsp;16218</div><div class=""><br class=""></div><div class="">Task Status of Volume gvAA01</div><div class="">------------------------------------------------------------------------------</div><div class="">There are no active volume tasks</div></div><div class=""><br class=""></div><div class="">And gluster volume info:&nbsp;</div><div class=""><br class=""></div><div class=""><div class=""># gluster volume info</div><div class=""><br class=""></div><div class="">Volume Name: gvAA01</div><div class="">Type: Distributed-Replicate</div><div class="">Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118</div><div class="">Status: Started</div><div class="">Snapshot Count: 0</div><div class="">Number of Bricks: 9 x (2 + 1) = 27</div><div class="">Transport-type: tcp</div><div class="">Bricks:</div><div class="">Brick1: 01-B:/brick1/gvAA01/brick</div><div class="">Brick2: 02-B:/brick1/gvAA01/brick</div><div class="">Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter)</div><div class="">Brick4: 01-B:/brick2/gvAA01/brick</div><div class="">Brick5: 02-B:/brick2/gvAA01/brick</div><div class="">Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter)</div><div class="">Brick7: 01-B:/brick3/gvAA01/brick</div><div class="">Brick8: 02-B:/brick3/gvAA01/brick</div><div class="">Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter)</div><div class="">Brick10: 01-B:/brick4/gvAA01/brick</div><div class="">Brick11: 02-B:/brick4/gvAA01/brick</div><div class="">Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter)</div><div class="">Brick13: 01-B:/brick5/gvAA01/brick</div><div class="">Brick14: 02-B:/brick5/gvAA01/brick</div><div class="">Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter)</div><div class="">Brick16: 01-B:/brick6/gvAA01/brick</div><div class="">Brick17: 02-B:/brick6/gvAA01/brick</div><div class="">Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter)</div><div class="">Brick19: 01-B:/brick7/gvAA01/brick</div><div class="">Brick20: 02-B:/brick7/gvAA01/brick</div><div class="">Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter)</div><div class="">Brick22: 01-B:/brick8/gvAA01/brick</div><div class="">Brick23: 02-B:/brick8/gvAA01/brick</div><div class="">Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter)</div><div class="">Brick25: 01-B:/brick9/gvAA01/brick</div><div class="">Brick26: 02-B:/brick9/gvAA01/brick</div><div class="">Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter)</div><div class="">Options Reconfigured:</div><div class="">cluster.shd-max-threads: 4</div><div class="">performance.least-prio-threads: 16</div><div class="">cluster.readdir-optimize: on</div><div class="">performance.quick-read: off</div><div class="">performance.stat-prefetch: off</div><div class="">cluster.data-self-heal: on</div><div class="">cluster.lookup-unhashed: auto</div><div class="">cluster.lookup-optimize: on</div><div class="">cluster.favorite-child-policy: mtime</div><div class="">server.allow-insecure: on</div><div class="">transport.address-family: inet</div><div class="">client.bind-insecure: on</div><div class="">cluster.entry-self-heal: off</div><div class="">cluster.metadata-self-heal: off</div><div class="">performance.md-cache-timeout: 600</div><div class="">cluster.self-heal-daemon: enable</div><div class="">performance.readdir-ahead: on</div><div class="">diagnostics.brick-log-level: INFO</div><div class="">nfs.disable: off</div></div><div class="gmail-m_3603348053678985885gmail-m_163229844221691320gmail-m_-4427179427950227310gmail-m_-1219620317814815363gmail-m_-3666594629493743861gmail-m_-8629894811503422375gmail-yj6qo"></div><br class="gmail-m_3603348053678985885gmail-m_163229844221691320gmail-m_-4427179427950227310gmail-m_-1219620317814815363gmail-m_-3666594629493743861gmail-m_-8629894811503422375gmail-Apple-interchange-newline"><div class="">Thank you for any assistance.&nbsp;</div><div class=""><br class=""></div><div class="">- Patrick</div></div>

_______________________________________________<br class="">Gluster-users mailing list<br class=""><a href="mailto:Gluster-users@gluster.org" target="_blank" class="">Gluster-users@gluster.org</a><br class=""><a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank" class="">https://lists.gluster.org/mailman/listinfo/gluster-users</a></div></blockquote></div><br class=""></div></div></blockquote></div>

</div></blockquote></div><br class=""></div></blockquote></div>

</div></blockquote></div><br class=""></div></div></blockquote></div>

</blockquote></div>

</blockquote></div>

_______________________________________________<br class="">Gluster-users mailing list<br class=""><a href="mailto:Gluster-users@gluster.org" target="_blank" class="">Gluster-users@gluster.org</a><br class=""><a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank" class="">https://lists.gluster.org/mailman/listinfo/gluster-users</a></div></blockquote></div><br class=""></div></div></blockquote></div>

</div></blockquote></div><br class=""></div></body></html>