[Gluster-users] Very slow performance on Sharded GlusterFS

Thu Jul 6 00:30:24 UTC 2017

What if you disabled eager lock and run your test again on the sharded
configuration along with the profile output?

# gluster volume set <VOL> cluster.eager-lock off

-Krutika

On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:

> Thanks. I think reusing the same volume was the cause of lack of IO
> distribution.
> The latest profile output looks much more realistic and in line with i
> would expect.
>
> Let me analyse the numbers a bit and get back.
>
> -Krutika
>
> On Tue, Jul 4, 2017 at 12:55 PM, <gencer at gencgiyen.com> wrote:
>
>> Hi Krutika,
>>
>>
>>
>> Thank you so much for myour reply. Let me answer all:
>>
>>
>>
>>    1. I have no idea why it did not get distributed over all bricks.
>>    2. Hm.. This is really weird.
>>
>>
>>
>> And others;
>>
>>
>>
>> No. I use only one volume. When I tested sharded and striped volumes, I
>> manually stopped volume, deleted volume, purged data (data inside of
>> bricks/disks) and re-create by using this command:
>>
>>
>>
>> sudo gluster volume create testvol replica 2
>> sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1
>> sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2
>> sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3
>> sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4
>> sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5
>> sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6
>> sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7
>> sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8
>> sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9
>> sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10
>> force
>>
>>
>>
>> and of course after that volume start executed. If shard enabled, I
>> enable that feature BEFORE I start the sharded volume than mount.
>>
>>
>>
>> I tried converting from one to another but then I saw documentation says
>> clean voluje should be better. So I tried clean method. Still same
>> performance.
>>
>>
>>
>> Testfile grows from 1GB to 5GB. And tests are dd. See this example:
>>
>>
>>
>> dd if=/dev/zero of=/mnt/testfile bs=1G count=5
>>
>> 5+0 records in
>>
>> 5+0 records out
>>
>> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
>>
>>
>>
>>
>>
>> >> dd if=/dev/zero of=/mnt/testfile bs=5G count=1
>>
>> This also gives same result. (bs and count reversed)
>>
>>
>>
>>
>>
>> And this example have generated a profile which I also attached to this
>> e-mail.
>>
>>
>>
>> Is there anything that I can try? I am open to all kind of suggestions.
>>
>>
>>
>> Thanks,
>>
>> Gencer.
>>
>>
>>
>> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
>> *Sent:* Tuesday, July 4, 2017 9:39 AM
>>
>> *To:* gencer at gencgiyen.com
>> *Cc:* gluster-user <gluster-users at gluster.org>
>> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>>
>>
>>
>> Hi Gencer,
>>
>> I just checked the volume-profile attachments.
>>
>> Things that seem really odd to me as far as the sharded volume is
>> concerned:
>>
>> 1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
>> seems to have witnessed all the IO. No other bricks witnessed any write
>> operations. This is unacceptable for a volume that has 8 other replica
>> sets. Why didn't the shards get distributed across all of these sets?
>>
>>
>>
>> 2. For replica set consisting of bricks 5 and 6 of node 09, I see that
>> the brick 5 is spending 99% of its time in FINODELK fop, when the fop that
>> should have dominated its profile should have been in fact WRITE.
>>
>> Could you throw some more light on your setup from gluster standpoint?
>> * For instance, are you using two different gluster volumes to gather
>> these numbers - one distributed-replicated-striped and another
>> distributed-replicated-sharded? Or are you merely converting a single
>> volume from one type to another?
>>
>>
>>
>> * And if there are indeed two volumes, could you share both their `volume
>> info` outputs to eliminate any confusion?
>>
>> * If there's just one volume, are you taking care to remove all data from
>> the mount point of this volume before converting it?
>>
>> * What is the size the test file grew to?
>>
>> * These attached profiles are against dd runs? Or the file download test?
>>
>>
>>
>> -Krutika
>>
>>
>>
>>
>>
>> On Mon, Jul 3, 2017 at 8:42 PM, <gencer at gencgiyen.com> wrote:
>>
>> Hi Krutika,
>>
>>
>>
>> Have you be able to look out my profiles? Do you have any clue, idea or
>> suggestion?
>>
>>
>>
>> Thanks,
>>
>> -Gencer
>>
>>
>>
>> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
>> *Sent:* Friday, June 30, 2017 3:50 PM
>>
>>
>> *To:* gencer at gencgiyen.com
>> *Cc:* gluster-user <gluster-users at gluster.org>
>> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>>
>>
>>
>> Just noticed that the way you have configured your brick order during
>> volume-create makes both replicas of every set reside on the same machine.
>>
>> That apart, do you see any difference if you change shard-block-size to
>> 512MB? Could you try that?
>>
>> If it doesn't help, could you share the volume-profile output for both
>> the tests (separate)?
>>
>> Here's what you do:
>>
>> 1. Start profile before starting your test - it could be dd or it could
>> be file download.
>>
>> # gluster volume profile <VOL> start
>>
>> 2. Run your test - again either dd or file-download.
>>
>> 3. Once the test has completed, run `gluster volume profile <VOL> info`
>> and redirect its output to a tmp file.
>>
>> 4. Stop profile
>>
>> # gluster volume profile <VOL> stop
>>
>> And attach the volume-profile output file that you saved at a temporary
>> location in step 3.
>>
>> -Krutika
>>
>>
>>
>> On Fri, Jun 30, 2017 at 5:33 PM, <gencer at gencgiyen.com> wrote:
>>
>> Hi Krutika,
>>
>>
>>
>> Sure, here is volume info:
>>
>>
>>
>> root at sr-09-loc-50-14-18:/# gluster volume info testvol
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Distributed-Replicate
>>
>> Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 10 x 2 = 20
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>>
>> Brick2: sr-09-loc-50-14-18:/bricks/brick2
>>
>> Brick3: sr-09-loc-50-14-18:/bricks/brick3
>>
>> Brick4: sr-09-loc-50-14-18:/bricks/brick4
>>
>> Brick5: sr-09-loc-50-14-18:/bricks/brick5
>>
>> Brick6: sr-09-loc-50-14-18:/bricks/brick6
>>
>> Brick7: sr-09-loc-50-14-18:/bricks/brick7
>>
>> Brick8: sr-09-loc-50-14-18:/bricks/brick8
>>
>> Brick9: sr-09-loc-50-14-18:/bricks/brick9
>>
>> Brick10: sr-09-loc-50-14-18:/bricks/brick10
>>
>> Brick11: sr-10-loc-50-14-18:/bricks/brick1
>>
>> Brick12: sr-10-loc-50-14-18:/bricks/brick2
>>
>> Brick13: sr-10-loc-50-14-18:/bricks/brick3
>>
>> Brick14: sr-10-loc-50-14-18:/bricks/brick4
>>
>> Brick15: sr-10-loc-50-14-18:/bricks/brick5
>>
>> Brick16: sr-10-loc-50-14-18:/bricks/brick6
>>
>> Brick17: sr-10-loc-50-14-18:/bricks/brick7
>>
>> Brick18: sr-10-loc-50-14-18:/bricks/brick8
>>
>> Brick19: sr-10-loc-50-14-18:/bricks/brick9
>>
>> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>>
>> Options Reconfigured:
>>
>> features.shard-block-size: 32MB
>>
>> features.shard: on
>>
>> transport.address-family: inet
>>
>> nfs.disable: on
>>
>>
>>
>> -Gencer.
>>
>>
>>
>> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
>> *Sent:* Friday, June 30, 2017 2:50 PM
>> *To:* gencer at gencgiyen.com
>> *Cc:* gluster-user <gluster-users at gluster.org>
>> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>>
>>
>>
>> Could you please provide the volume-info output?
>>
>> -Krutika
>>
>>
>>
>> On Fri, Jun 30, 2017 at 4:23 PM, <gencer at gencgiyen.com> wrote:
>>
>> Hi,
>>
>>
>>
>> I have an 2 nodes with 20 bricks in total (10+10).
>>
>>
>>
>> First test:
>>
>>
>>
>> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>>
>> 10GbE Speed between nodes
>>
>>
>>
>> “dd” performance: 400mb/s and higher
>>
>> Downloading a large file from internet and directly to the gluster:
>> 250-300mb/s
>>
>>
>>
>> Now same test without Stripe but with sharding. This results are same
>> when I set shard size 4MB or 32MB. (Again 2x Replica here)
>>
>>
>>
>> Dd performance: 70mb/s
>>
>> Download directly to the gluster performance : 60mb/s
>>
>>
>>
>> Now, If we do this test twice at the same time (two dd or two doewnload
>> at the same time) it goes below 25/mb each or slower.
>>
>>
>>
>> I thought sharding is at least equal or a little slower (maybe?) but
>> these results are terribly slow.
>>
>>
>>
>> I tried tuning (cache, window-size etc..). Nothing helps.
>>
>>
>>
>> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
>> 4TB each.
>>
>>
>>
>> Is there any tweak/tuning out there to make it fast?
>>
>>
>>
>> Or is this an expected behavior? If its, It is unacceptable. So slow. I
>> cannot use this on production as it is terribly slow.
>>
>>
>>
>> The reason behind I use shard instead of stripe is i would like to
>> eleminate files that bigger than brick size.
>>
>>
>>
>> Thanks,
>>
>> Gencer.
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170706/d55a20f1/attachment.html>