<div dir="ltr">Hi,<div><br></div><div>Does the gluster team have any feedback about this? Resolving the "Found anomalies" issues may be key to resolving dir list speed issues.<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 30, 2020 at 10:36 PM Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On April 30, 2020 9:05:19 PM GMT+03:00, Artem Russakovskii <<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br>
>I did this on the same prod instance just now.<br>
><br>
>'find' on a fuse gluster dir with 40k+ files:<br>
>1st run: 3m56.261s<br>
>2nd run: 0m24.970s<br>
>3rd run: 0m24.099s<br>
><br>
>At this point, I killed all gluster services on one of the 4 servers<br>
>and<br>
>verified that that brick went offline.<br>
><br>
>1st run: 0m38.131s<br>
>2nd run: 0m19.369s<br>
>3rd run: 0m23.576s<br>
><br>
>Nothing conclusive really IMO.<br>
><br>
>Sincerely,<br>
>Artem<br>
><br>
>--<br>
>Founder, Android Police <<a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">http://www.androidpolice.com</a>>, APK Mirror<br>
><<a href="http://www.apkmirror.com/" rel="noreferrer" target="_blank">http://www.apkmirror.com/</a>>, Illogical Robot LLC<br>
><a href="http://beerpla.net" rel="noreferrer" target="_blank">beerpla.net</a> | @ArtemR <<a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">http://twitter.com/ArtemR</a>><br>
><br>
><br>
>On Thu, Apr 30, 2020 at 9:55 AM Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>wrote:<br>
><br>
>> On April 30, 2020 6:27:10 PM GMT+03:00, Artem Russakovskii <<br>
>> <a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br>
>> >Hi Strahil, in the original email I included both the times for the<br>
>> >first<br>
>> >and subsequent reads on the fuse mounted gluster volume as well as<br>
>the<br>
>> >xfs<br>
>> >filesystem the gluster data resides on (this is the brick, right?).<br>
>> ><br>
>> >On Thu, Apr 30, 2020, 7:44 AM Strahil Nikolov<br>
><<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>> >wrote:<br>
>> ><br>
>> >> On April 30, 2020 4:24:23 AM GMT+03:00, Artem Russakovskii <<br>
>> >> <a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br>
>> >> >Hi all,<br>
>> >> ><br>
>> >> >We have 500GB and 10TB 4x1 replicate xfs-based gluster volumes,<br>
>and<br>
>> >the<br>
>> >> >10TB one especially is extremely slow to do certain things with<br>
>(and<br>
>> >> >has<br>
>> >> >been since gluster 3.x when we started). We're currently on 5.13.<br>
>> >> ><br>
>> >> >The number of files isn't even what I'd consider that great -<br>
>under<br>
>> >> >100k<br>
>> >> >per dir.<br>
>> >> ><br>
>> >> >Here are some numbers to look at:<br>
>> >> ><br>
>> >> >On gluster volume in a dir of 45k files:<br>
>> >> >The first time<br>
>> >> ><br>
>> >> >time find | wc -l<br>
>> >> >45423<br>
>> >> >real 8m44.819s<br>
>> >> >user 0m0.459s<br>
>> >> >sys 0m0.998s<br>
>> >> ><br>
>> >> >And again<br>
>> >> ><br>
>> >> >time find | wc -l<br>
>> >> >45423<br>
>> >> >real 0m34.677s<br>
>> >> >user 0m0.291s<br>
>> >> >sys 0m0.754s<br>
>> >> ><br>
>> >> ><br>
>> >> >If I run the same operation on the xfs block device itself:<br>
>> >> >The first time<br>
>> >> ><br>
>> >> >time find | wc -l<br>
>> >> >45423<br>
>> >> >real 0m13.514s<br>
>> >> >user 0m0.144s<br>
>> >> >sys 0m0.501s<br>
>> >> ><br>
>> >> >And again<br>
>> >> ><br>
>> >> >time find | wc -l<br>
>> >> >45423<br>
>> >> >real 0m0.197s<br>
>> >> >user 0m0.088s<br>
>> >> >sys 0m0.106s<br>
>> >> ><br>
>> >> ><br>
>> >> >I'd expect a performance difference here but just as it was<br>
>several<br>
>> >> >years<br>
>> >> >ago when we started with gluster, it's still huge, and simple<br>
>file<br>
>> >> >listings<br>
>> >> >are incredibly slow.<br>
>> >> ><br>
>> >> >At the time, the team was looking to do some optimizations, but<br>
>I'm<br>
>> >not<br>
>> >> >sure this has happened.<br>
>> >> ><br>
>> >> >What can we do to try to improve performance?<br>
>> >> ><br>
>> >> >Thank you.<br>
>> >> ><br>
>> >> ><br>
>> >> ><br>
>> >> >Some setup values follow.<br>
>> >> ><br>
>> >> >xfs_info /mnt/SNIP_block1<br>
>> >> >meta-data=/dev/sdc isize=512 agcount=103,<br>
>> >> >agsize=26214400<br>
>> >> >blks<br>
>> >> > = sectsz=512 attr=2,<br>
>projid32bit=1<br>
>> >> > = crc=1 finobt=1, sparse=0,<br>
>> >rmapbt=0<br>
>> >> > = reflink=0<br>
>> >> >data = bsize=4096 blocks=2684354560,<br>
>> >> >imaxpct=25<br>
>> >> > = sunit=0 swidth=0 blks<br>
>> >> >naming =version 2 bsize=4096 ascii-ci=0, ftype=1<br>
>> >> >log =internal log bsize=4096 blocks=51200,<br>
>> >version=2<br>
>> >> > = sectsz=512 sunit=0 blks,<br>
>> >lazy-count=1<br>
>> >> >realtime =none extsz=4096 blocks=0,<br>
>rtextents=0<br>
>> >> ><br>
>> >> >Volume Name: SNIP_data1<br>
>> >> >Type: Replicate<br>
>> >> >Volume ID: SNIP<br>
>> >> >Status: Started<br>
>> >> >Snapshot Count: 0<br>
>> >> >Number of Bricks: 1 x 4 = 4<br>
>> >> >Transport-type: tcp<br>
>> >> >Bricks:<br>
>> >> >Brick1: nexus2:/mnt/SNIP_block1/SNIP_data1<br>
>> >> >Brick2: forge:/mnt/SNIP_block1/SNIP_data1<br>
>> >> >Brick3: hive:/mnt/SNIP_block1/SNIP_data1<br>
>> >> >Brick4: citadel:/mnt/SNIP_block1/SNIP_data1<br>
>> >> >Options Reconfigured:<br>
>> >> >cluster.quorum-count: 1<br>
>> >> >cluster.quorum-type: fixed<br>
>> >> >network.ping-timeout: 5<br>
>> >> >network.remote-dio: enable<br>
>> >> >performance.rda-cache-limit: 256MB<br>
>> >> >performance.readdir-ahead: on<br>
>> >> >performance.parallel-readdir: on<br>
>> >> >network.inode-lru-limit: 500000<br>
>> >> >performance.md-cache-timeout: 600<br>
>> >> >performance.cache-invalidation: on<br>
>> >> >performance.stat-prefetch: on<br>
>> >> >features.cache-invalidation-timeout: 600<br>
>> >> >features.cache-invalidation: on<br>
>> >> >cluster.readdir-optimize: on<br>
>> >> >performance.io-thread-count: 32<br>
>> >> >server.event-threads: 4<br>
>> >> >client.event-threads: 4<br>
>> >> >performance.read-ahead: off<br>
>> >> >cluster.lookup-optimize: on<br>
>> >> >performance.cache-size: 1GB<br>
>> >> >cluster.self-heal-daemon: enable<br>
>> >> >transport.address-family: inet<br>
>> >> >nfs.disable: on<br>
>> >> >performance.client-io-threads: on<br>
>> >> >cluster.granular-entry-heal: enable<br>
>> >> >cluster.data-self-heal-algorithm: full<br>
>> >> ><br>
>> >> >Sincerely,<br>
>> >> >Artem<br>
>> >> ><br>
>> >> >--<br>
>> >> >Founder, Android Police <<a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">http://www.androidpolice.com</a>>, APK<br>
>Mirror<br>
>> >> ><<a href="http://www.apkmirror.com/" rel="noreferrer" target="_blank">http://www.apkmirror.com/</a>>, Illogical Robot LLC<br>
>> >> ><a href="http://beerpla.net" rel="noreferrer" target="_blank">beerpla.net</a> | @ArtemR <<a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">http://twitter.com/ArtemR</a>><br>
>> >><br>
>> >> Hi Artem,<br>
>> >><br>
>> >> Have you checked the same on brick level ? How big is the<br>
>difference<br>
>> >?<br>
>> >><br>
>> >> Best Regards,<br>
>> >> Strahil Nikolov<br>
>> >><br>
>><br>
>> Hi Artem,<br>
>><br>
>> My bad I missed the 'xfs' word... Still the difference is huge.<br>
>><br>
>> May I ask you to do a test again (pure curiosity) as follows:<br>
>> 1. Repeat the test from before<br>
>> 2. Stop 1 brick and test again.<br>
>><br>
>><br>
>> P.S.: You can try it on the test cluster<br>
>><br>
>> Best Regards,<br>
>> Strahil Nikolov<br>
>><br>
<br>
Hi Artem,<br>
<br>
I was wondering if the 4th replica is adding additional overhead (another dir to check), but the test is not very conclusive.<br>
<br>
<br>
Actually the 'anomalities' log entries in your pool could be a symptom of another pdoblem (just like the long listing time).<br>
<br>
I will try to reproduce your setup (smaller scale - 1 brick 50k files) and then will try with 3 bricks.<br>
<br>
<br>
Best Regards,<br>
Strahil Nikolov<br>
</blockquote></div>