<div dir="ltr"><div><div><div><div><div><div>Hi Gencer,<br><br></div>I just checked the volume-profile attachments.<br><br></div>Things that seem really odd to me as far as the sharded volume is concerned:<br><br></div>1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?<br></div></div></div><div><br></div><div>2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.<br><br>Could you throw some more light on your setup from gluster standpoint?<br>*
For instance, are you using two different gluster volumes to gather
these numbers - one distributed-replicated-striped and another
distributed-replicated-sharded? Or are you merely converting a single
volume from one type to another?<br><div><br></div><div>* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?<br><br></div><div>* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?<br><br></div>* What is the size the test file grew to?<br><br></div><div>* These attached profiles are against dd runs? Or the file download test?<br></div><div><br></div><div>-Krutika<br></div><div><br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 3, 2017 at 8:42 PM, <span dir="ltr"><<a href="mailto:gencer@gencgiyen.com" target="_blank">gencer@gencgiyen.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link="blue" vlink="purple" lang="TR"><div class="m_3380879455611854782WordSection1"><p class="MsoNormal"><span>Hi Krutika,<u></u><u></u></span></p><p class="MsoNormal"><span><u></u> <u></u></span></p><p class="MsoNormal"><span>Have you be able to look out my profiles? Do you have any clue, idea or suggestion?<u></u><u></u></span></p><p class="MsoNormal"><span><u></u> <u></u></span></p><p class="MsoNormal"><span>Thanks,<u></u><u></u></span></p><p class="MsoNormal"><span>-Gencer<u></u><u></u></span></p><p class="MsoNormal"><span><u></u> <u></u></span></p><p class="MsoNormal"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> Krutika Dhananjay [mailto:<a href="mailto:kdhananj@redhat.com" target="_blank">kdhananj@redhat.com</a>] <br><b>Sent:</b> Friday, June 30, 2017 3:50 PM</span></p><div><div class="h5"><br><b>To:</b> <a href="mailto:gencer@gencgiyen.com" target="_blank">gencer@gencgiyen.com</a><br><b>Cc:</b> gluster-user <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br><b>Subject:</b> Re: [Gluster-users] Very slow performance on Sharded GlusterFS<u></u><u></u></div></div><p></p><div><div class="h5"><p class="MsoNormal"><u></u> <u></u></p><div><div><div><div><div><div><div><div><div><div><div><div><p class="MsoNormal" style="margin-bottom:12.0pt">Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt">That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt">If it doesn't help, could you share the volume-profile output for both the tests (separate)?<u></u><u></u></p></div><p class="MsoNormal">Here's what you do:<u></u><u></u></p></div><p class="MsoNormal">1. Start profile before starting your test - it could be dd or it could be file download.<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt"># gluster volume profile <VOL> start<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt">2. Run your test - again either dd or file-download.<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt">3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.<u></u><u></u></p></div><p class="MsoNormal">4. Stop profile<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt"># gluster volume profile <VOL> stop<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt">And attach the volume-profile output file that you saved at a temporary location in step 3.<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt">-Krutika<u></u><u></u></p></div><div><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">On Fri, Jun 30, 2017 at 5:33 PM, <<a href="mailto:gencer@gencgiyen.com" target="_blank">gencer@gencgiyen.com</a>> wrote:<u></u><u></u></p><blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm"><div><div><p class="MsoNormal">Hi Krutika,<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Sure, here is volume info:<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">root@sr-09-loc-50-14-18:/# gluster volume info testvol<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Volume Name: testvol<u></u><u></u></p><p class="MsoNormal">Type: Distributed-Replicate<u></u><u></u></p><p class="MsoNormal">Volume ID: 30426017-59d5-4091-b6bc-<wbr>279a905b704a<u></u><u></u></p><p class="MsoNormal">Status: Started<u></u><u></u></p><p class="MsoNormal">Snapshot Count: 0<u></u><u></u></p><p class="MsoNormal">Number of Bricks: 10 x 2 = 20<u></u><u></u></p><p class="MsoNormal">Transport-type: tcp<u></u><u></u></p><p class="MsoNormal">Bricks:<u></u><u></u></p><p class="MsoNormal">Brick1: sr-09-loc-50-14-18:/bricks/<wbr>brick1<u></u><u></u></p><p class="MsoNormal">Brick2: sr-09-loc-50-14-18:/bricks/<wbr>brick2<u></u><u></u></p><p class="MsoNormal">Brick3: sr-09-loc-50-14-18:/bricks/<wbr>brick3<u></u><u></u></p><p class="MsoNormal">Brick4: sr-09-loc-50-14-18:/bricks/<wbr>brick4<u></u><u></u></p><p class="MsoNormal">Brick5: sr-09-loc-50-14-18:/bricks/<wbr>brick5<u></u><u></u></p><p class="MsoNormal">Brick6: sr-09-loc-50-14-18:/bricks/<wbr>brick6<u></u><u></u></p><p class="MsoNormal">Brick7: sr-09-loc-50-14-18:/bricks/<wbr>brick7<u></u><u></u></p><p class="MsoNormal">Brick8: sr-09-loc-50-14-18:/bricks/<wbr>brick8<u></u><u></u></p><p class="MsoNormal">Brick9: sr-09-loc-50-14-18:/bricks/<wbr>brick9<u></u><u></u></p><p class="MsoNormal">Brick10: sr-09-loc-50-14-18:/bricks/<wbr>brick10<u></u><u></u></p><p class="MsoNormal">Brick11: sr-10-loc-50-14-18:/bricks/<wbr>brick1<u></u><u></u></p><p class="MsoNormal">Brick12: sr-10-loc-50-14-18:/bricks/<wbr>brick2<u></u><u></u></p><p class="MsoNormal">Brick13: sr-10-loc-50-14-18:/bricks/<wbr>brick3<u></u><u></u></p><p class="MsoNormal">Brick14: sr-10-loc-50-14-18:/bricks/<wbr>brick4<u></u><u></u></p><p class="MsoNormal">Brick15: sr-10-loc-50-14-18:/bricks/<wbr>brick5<u></u><u></u></p><p class="MsoNormal">Brick16: sr-10-loc-50-14-18:/bricks/<wbr>brick6<u></u><u></u></p><p class="MsoNormal">Brick17: sr-10-loc-50-14-18:/bricks/<wbr>brick7<u></u><u></u></p><p class="MsoNormal">Brick18: sr-10-loc-50-14-18:/bricks/<wbr>brick8<u></u><u></u></p><p class="MsoNormal">Brick19: sr-10-loc-50-14-18:/bricks/<wbr>brick9<u></u><u></u></p><p class="MsoNormal">Brick20: sr-10-loc-50-14-18:/bricks/<wbr>brick10<u></u><u></u></p><p class="MsoNormal">Options Reconfigured:<u></u><u></u></p><p class="MsoNormal">features.shard-block-size: 32MB<u></u><u></u></p><p class="MsoNormal">features.shard: on<u></u><u></u></p><p class="MsoNormal">transport.address-family: inet<u></u><u></u></p><p class="MsoNormal">nfs.disable: on<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">-Gencer.<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> Krutika Dhananjay [mailto:<a href="mailto:kdhananj@redhat.com" target="_blank">kdhananj@redhat.com</a>] <br><b>Sent:</b> Friday, June 30, 2017 2:50 PM<br><b>To:</b> <a href="mailto:gencer@gencgiyen.com" target="_blank">gencer@gencgiyen.com</a><br><b>Cc:</b> gluster-user <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br><b>Subject:</b> Re: [Gluster-users] Very slow performance on Sharded GlusterFS</span><u></u><u></u></p><div><div><p class="MsoNormal"> <u></u><u></u></p><div><div><p class="MsoNormal" style="margin-bottom:12.0pt">Could you please provide the volume-info output?<u></u><u></u></p></div><p class="MsoNormal">-Krutika<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p><div><p class="MsoNormal">On Fri, Jun 30, 2017 at 4:23 PM, <<a href="mailto:gencer@gencgiyen.com" target="_blank">gencer@gencgiyen.com</a>> wrote:<u></u><u></u></p><blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt"><div><div><div><div><p class="MsoNormal">Hi,<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">I have an 2 nodes with 20 bricks in total (10+10).<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">First test: <u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">2 Nodes with Distributed – Striped – Replicated (2 x 2)<u></u><u></u></p><p class="MsoNormal">10GbE Speed between nodes<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">“dd” performance: 400mb/s and higher<u></u><u></u></p><p class="MsoNormal">Downloading a large file from internet and directly to the gluster: 250-300mb/s<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Dd performance: 70mb/s<u></u><u></u></p><p class="MsoNormal">Download directly to the gluster performance : 60mb/s<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">I tried tuning (cache, window-size etc..). Nothing helps.<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Is there any tweak/tuning out there to make it fast?<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow. <u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Thanks,<u></u><u></u></p><p class="MsoNormal">Gencer.<u></u><u></u></p></div></div></div></div><p class="MsoNormal"><br>______________________________<wbr>_________________<br>Gluster-users mailing list<br><a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><u></u><u></u></p></blockquote></div><p class="MsoNormal"> <u></u><u></u></p></div></div></div></div></div></blockquote></div><p class="MsoNormal"><u></u> <u></u></p></div></div></div></div></div></blockquote></div><br></div>