<div dir="ltr"><div>Dear Vlad,</div><div><br></div><div>I&#39;m sorry, I don&#39;t want to test this again on my system just yet! It caused too much instability for my users and I don&#39;t have enough resources for a development environment. The only other variables that changed before the crashes was the group metadata-cache[0], which I enabled the same day as the parallel-readdir and readdir-ahead options:</div><div><br></div><div>$ gluster volume set homes group metadata-cache</div><div><br></div><div>I&#39;m hoping Atin or Poornima can shed some light and squash this bug.<br></div><div><br></div><div>[0] <a href="https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md">https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md</a></div><div><br></div><div>Regards,<br></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Jan 26, 2018 at 6:10 AM Vlad Kopylov &lt;<a href="mailto:vladkopy@gmail.com">vladkopy@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">can you please test parallel-readdir or readdir-ahead gives<br>
disconnects? so we know which to disable<br>
<br>
parallel-readdir doing magic ran on pdf from last year<br>
<a href="https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf" rel="noreferrer" target="_blank">https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf</a><br>
<br>
-v<br>
<br>
On Thu, Jan 25, 2018 at 8:20 AM, Alan Orth &lt;<a href="mailto:alan.orth@gmail.com" target="_blank">alan.orth@gmail.com</a>&gt; wrote:<br>
&gt; By the way, on a slightly related note, I&#39;m pretty sure either<br>
&gt; parallel-readdir or readdir-ahead has a regression in GlusterFS 3.12.x. We<br>
&gt; are running CentOS 7 with kernel-3.10.0-693.11.6.el7.x86_6.<br>
&gt;<br>
&gt; I updated my servers and clients to 3.12.4 and enabled these two options<br>
&gt; after reading about them in the 3.10.0 and 3.11.0 release notes. In the days<br>
&gt; after enabling these two options all of my clients kept getting disconnected<br>
&gt; from the volume. The error upon attempting to list a directory or read a<br>
&gt; file was &quot;Transport endpoint is not connected&quot;, after which I would force<br>
&gt; unmount the volume with `umount -fl /home` and remount it, only to have it<br>
&gt; get disconnected again a few hours later.<br>
&gt;<br>
&gt; Every time the volume disconnected I looked in the client mount log and only<br>
&gt; found information such as:<br>
&gt;<br>
&gt; [2018-01-24 05:52:27.695225] I [MSGID: 108026]<br>
&gt; [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:<br>
&gt; Completed metadata selfheal on ed3fbafc-734b-41ca-ab30-216399fb9168.<br>
&gt; sources=[0]  sinks=1<br>
&gt; [2018-01-24 05:52:27.700611] I [MSGID: 108026]<br>
&gt; [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]<br>
&gt; 2-homes-replicate-1: performing metadata selfheal on<br>
&gt; b6a53629-a831-4ee3-a35e-f47c04297aaa<br>
&gt; [2018-01-24 05:52:27.703021] I [MSGID: 108026]<br>
&gt; [afr-self-heal-common.c:1656:afr_log_selfheal] 2-homes-replicate-1:<br>
&gt; Completed metadata selfheal on b6a53629-a831-4ee3-a35e-f47c04297aaa.<br>
&gt; sources=[0]  sinks=1<br>
&gt;<br>
&gt; I enabled debug logging for that volume&#39;s client mount with `gluster volume<br>
&gt; set homes diagnostics.client-log-level DEBUG` and then I saw this in the<br>
&gt; client mount log the next time it disconnected:<br>
&gt;<br>
&gt; [2018-01-24 08:55:19.138810] D [MSGID: 0] [io-threads.c:358:iot_schedule]<br>
&gt; 0-homes-io-threads: LOOKUP scheduled as fast fop<br>
&gt; [2018-01-24 08:55:19.138849] D [MSGID: 0] [dht-common.c:2711:dht_lookup]<br>
&gt; 0-homes-dht: Calling fresh lookup for<br>
&gt; /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas on<br>
&gt; homes-readdir-ahead-1<br>
&gt; [2018-01-24 08:55:19.138928] D [MSGID: 0] [io-threads.c:358:iot_schedule]<br>
&gt; 0-homes-io-threads: FSTAT scheduled as fast fop<br>
&gt; [2018-01-24 08:55:19.138958] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn]<br>
&gt; 0-homes-replicate-1: e6ee0427-b17d-4464-a738-e8ea70d77d95: generation now vs<br>
&gt; cached: 2, 2<br>
&gt; [2018-01-24 08:55:19.139187] D [MSGID: 0] [dht-common.c:2294:dht_lookup_cbk]<br>
&gt; 0-homes-dht: fresh_lookup returned for<br>
&gt; /vchebii/revtrans/Hircus-XM_018067032.1.pep.align.fas with op_ret 0<br>
&gt; [2018-01-24 08:55:19.139200] D [MSGID: 0]<br>
&gt; [dht-layout.c:873:dht_layout_preset] 0-homes-dht: file =<br>
&gt; 00000000-0000-0000-0000-000000000000, subvol = homes-readdir-ahead-1<br>
&gt; [2018-01-24 08:55:19.139257] D [MSGID: 0] [io-threads.c:358:iot_schedule]<br>
&gt; 0-homes-io-threads: READDIRP scheduled as fast fop<br>
&gt;<br>
&gt; On a hunch I disabled both parallel-readdir and readdir-ahead, which I had<br>
&gt; only enabled a few days before, and now all of the clients are much more<br>
&gt; stable, with zero disconnections in the days since I disabled those two<br>
&gt; volume options.<br>
&gt;<br>
&gt; Please take a look! Thanks,<br>
&gt;<br>
&gt; On Wed, Jan 24, 2018 at 5:59 AM Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Adding Poornima to take a look at it and comment.<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Jan 23, 2018 at 10:39 PM, Alan Orth &lt;<a href="mailto:alan.orth@gmail.com" target="_blank">alan.orth@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Hello,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I saw that parallel-readdir was an experimental feature in GlusterFS<br>
&gt;&gt;&gt; version 3.10.0, became stable in version 3.11.0, and is now recommended for<br>
&gt;&gt;&gt; small file workloads in the Red Hat Gluster Storage Server documentation[2].<br>
&gt;&gt;&gt; I&#39;ve successfully enabled this on one of my volumes but I notice the<br>
&gt;&gt;&gt; following in the client mount log:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; [2018-01-23 10:24:24.048055] W [MSGID: 101174]<br>
&gt;&gt;&gt; [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-1: option<br>
&gt;&gt;&gt; &#39;parallel-readdir&#39; is not recognized<br>
&gt;&gt;&gt; [2018-01-23 10:24:24.048072] W [MSGID: 101174]<br>
&gt;&gt;&gt; [graph.c:363:_log_if_unknown_option] 0-homes-readdir-ahead-0: option<br>
&gt;&gt;&gt; &#39;parallel-readdir&#39; is not recognized<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; The GlusterFS version on the client and server is 3.12.4. What is going<br>
&gt;&gt;&gt; on?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; [0]<br>
&gt;&gt;&gt; <a href="https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md" rel="noreferrer" target="_blank">https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.0.md</a><br>
&gt;&gt;&gt; [1]<br>
&gt;&gt;&gt; <a href="https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md" rel="noreferrer" target="_blank">https://github.com/gluster/glusterfs/blob/release-3.11/doc/release-notes/3.11.0.md</a><br>
&gt;&gt;&gt; [2]<br>
&gt;&gt;&gt; <a href="https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements" rel="noreferrer" target="_blank">https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/small_file_performance_enhancements</a><br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Thank you,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Alan Orth<br>
&gt;&gt;&gt; <a href="mailto:alan.orth@gmail.com" target="_blank">alan.orth@gmail.com</a><br>
&gt;&gt;&gt; <a href="https://picturingjordan.com" rel="noreferrer" target="_blank">https://picturingjordan.com</a><br>
&gt;&gt;&gt; <a href="https://englishbulgaria.net" rel="noreferrer" target="_blank">https://englishbulgaria.net</a><br>
&gt;&gt;&gt; <a href="https://mjanja.ch" rel="noreferrer" target="_blank">https://mjanja.ch</a><br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; _______________________________________________<br>
&gt;&gt;&gt; Gluster-users mailing list<br>
&gt;&gt;&gt; <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
&gt;&gt;&gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt; --<br>
&gt;<br>
&gt; Alan Orth<br>
&gt; <a href="mailto:alan.orth@gmail.com" target="_blank">alan.orth@gmail.com</a><br>
&gt; <a href="https://picturingjordan.com" rel="noreferrer" target="_blank">https://picturingjordan.com</a><br>
&gt; <a href="https://englishbulgaria.net" rel="noreferrer" target="_blank">https://englishbulgaria.net</a><br>
&gt; <a href="https://mjanja.ch" rel="noreferrer" target="_blank">https://mjanja.ch</a><br>
&gt;<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; Gluster-users mailing list<br>
&gt; <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
&gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><p dir="ltr">Alan Orth<br>
<a href="mailto:alan.orth@gmail.com">alan.orth@gmail.com</a><br>
<a href="https://picturingjordan.com">https://picturingjordan.com</a><br>
<a href="https://englishbulgaria.net">https://englishbulgaria.net</a><br>
<a href="https://mjanja.ch">https://mjanja.ch</a></p>
</div>