[Gluster-users] Very slow performance on Sharded GlusterFS

Krutika Dhananjay kdhananj at redhat.com
Tue Jul 4 06:39:20 UTC 2017


Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
seems to have witnessed all the IO. No other bricks witnessed any write
operations. This is unacceptable for a volume that has 8 other replica
sets. Why didn't the shards get distributed across all of these sets?

2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
brick 5 is spending 99% of its time in FINODELK fop, when the fop that
should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these
numbers - one distributed-replicated-striped and another
distributed-replicated-sharded? Or are you merely converting a single
volume from one type to another?

* And if there are indeed two volumes, could you share both their `volume
info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from
the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?

-Krutika



On Mon, Jul 3, 2017 at 8:42 PM, <gencer at gencgiyen.com> wrote:

> Hi Krutika,
>
>
>
> Have you be able to look out my profiles? Do you have any clue, idea or
> suggestion?
>
>
>
> Thanks,
>
> -Gencer
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Friday, June 30, 2017 3:50 PM
>
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Just noticed that the way you have configured your brick order during
> volume-create makes both replicas of every set reside on the same machine.
>
> That apart, do you see any difference if you change shard-block-size to
> 512MB? Could you try that?
>
> If it doesn't help, could you share the volume-profile output for both the
> tests (separate)?
>
> Here's what you do:
>
> 1. Start profile before starting your test - it could be dd or it could be
> file download.
>
> # gluster volume profile <VOL> start
>
> 2. Run your test - again either dd or file-download.
>
> 3. Once the test has completed, run `gluster volume profile <VOL> info`
> and redirect its output to a tmp file.
>
> 4. Stop profile
>
> # gluster volume profile <VOL> stop
>
> And attach the volume-profile output file that you saved at a temporary
> location in step 3.
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 5:33 PM, <gencer at gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Sure, here is volume info:
>
>
>
> root at sr-09-loc-50-14-18:/# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick4: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick6: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick8: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick10: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick11: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick13: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick15: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick17: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick19: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> -Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Friday, June 30, 2017 2:50 PM
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Could you please provide the volume-info output?
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 4:23 PM, <gencer at gencgiyen.com> wrote:
>
> Hi,
>
>
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>
>
> First test:
>
>
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>
>
> Now same test without Stripe but with sharding. This results are same when
> I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>
>
> Now, If we do this test twice at the same time (two dd or two doewnload at
> the same time) it goes below 25/mb each or slower.
>
>
>
> I thought sharding is at least equal or a little slower (maybe?) but these
> results are terribly slow.
>
>
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
> 4TB each.
>
>
>
> Is there any tweak/tuning out there to make it fast?
>
>
>
> Or is this an expected behavior? If its, It is unacceptable. So slow. I
> cannot use this on production as it is terribly slow.
>
>
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>
>
> Thanks,
>
> Gencer.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170704/52832bfd/attachment.html>


More information about the Gluster-users mailing list