[Gluster-users] Very slow performance on Sharded GlusterFS

Krutika Dhananjay kdhananj at redhat.com
Tue Jul 4 15:33:18 UTC 2017


Thanks. I think reusing the same volume was the cause of lack of IO
distribution.
The latest profile output looks much more realistic and in line with i
would expect.

Let me analyse the numbers a bit and get back.

-Krutika

On Tue, Jul 4, 2017 at 12:55 PM, <gencer at gencgiyen.com> wrote:

> Hi Krutika,
>
>
>
> Thank you so much for myour reply. Let me answer all:
>
>
>
>    1. I have no idea why it did not get distributed over all bricks.
>    2. Hm.. This is really weird.
>
>
>
> And others;
>
>
>
> No. I use only one volume. When I tested sharded and striped volumes, I
> manually stopped volume, deleted volume, purged data (data inside of
> bricks/disks) and re-create by using this command:
>
>
>
> sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1
> sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2
> sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3
> sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4
> sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5
> sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6
> sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7
> sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8
> sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9
> sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10
> sr-10-loc-50-14-18:/bricks/brick10 force
>
>
>
> and of course after that volume start executed. If shard enabled, I enable
> that feature BEFORE I start the sharded volume than mount.
>
>
>
> I tried converting from one to another but then I saw documentation says
> clean voluje should be better. So I tried clean method. Still same
> performance.
>
>
>
> Testfile grows from 1GB to 5GB. And tests are dd. See this example:
>
>
>
> dd if=/dev/zero of=/mnt/testfile bs=1G count=5
>
> 5+0 records in
>
> 5+0 records out
>
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
>
>
>
>
>
> >> dd if=/dev/zero of=/mnt/testfile bs=5G count=1
>
> This also gives same result. (bs and count reversed)
>
>
>
>
>
> And this example have generated a profile which I also attached to this
> e-mail.
>
>
>
> Is there anything that I can try? I am open to all kind of suggestions.
>
>
>
> Thanks,
>
> Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Tuesday, July 4, 2017 9:39 AM
>
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Hi Gencer,
>
> I just checked the volume-profile attachments.
>
> Things that seem really odd to me as far as the sharded volume is
> concerned:
>
> 1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
> seems to have witnessed all the IO. No other bricks witnessed any write
> operations. This is unacceptable for a volume that has 8 other replica
> sets. Why didn't the shards get distributed across all of these sets?
>
>
>
> 2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
> brick 5 is spending 99% of its time in FINODELK fop, when the fop that
> should have dominated its profile should have been in fact WRITE.
>
> Could you throw some more light on your setup from gluster standpoint?
> * For instance, are you using two different gluster volumes to gather
> these numbers - one distributed-replicated-striped and another
> distributed-replicated-sharded? Or are you merely converting a single
> volume from one type to another?
>
>
>
> * And if there are indeed two volumes, could you share both their `volume
> info` outputs to eliminate any confusion?
>
> * If there's just one volume, are you taking care to remove all data from
> the mount point of this volume before converting it?
>
> * What is the size the test file grew to?
>
> * These attached profiles are against dd runs? Or the file download test?
>
>
>
> -Krutika
>
>
>
>
>
> On Mon, Jul 3, 2017 at 8:42 PM, <gencer at gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Have you be able to look out my profiles? Do you have any clue, idea or
> suggestion?
>
>
>
> Thanks,
>
> -Gencer
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Friday, June 30, 2017 3:50 PM
>
>
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Just noticed that the way you have configured your brick order during
> volume-create makes both replicas of every set reside on the same machine.
>
> That apart, do you see any difference if you change shard-block-size to
> 512MB? Could you try that?
>
> If it doesn't help, could you share the volume-profile output for both the
> tests (separate)?
>
> Here's what you do:
>
> 1. Start profile before starting your test - it could be dd or it could be
> file download.
>
> # gluster volume profile <VOL> start
>
> 2. Run your test - again either dd or file-download.
>
> 3. Once the test has completed, run `gluster volume profile <VOL> info`
> and redirect its output to a tmp file.
>
> 4. Stop profile
>
> # gluster volume profile <VOL> stop
>
> And attach the volume-profile output file that you saved at a temporary
> location in step 3.
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 5:33 PM, <gencer at gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Sure, here is volume info:
>
>
>
> root at sr-09-loc-50-14-18:/# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick4: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick6: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick8: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick10: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick11: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick13: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick15: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick17: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick19: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> -Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Friday, June 30, 2017 2:50 PM
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Could you please provide the volume-info output?
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 4:23 PM, <gencer at gencgiyen.com> wrote:
>
> Hi,
>
>
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>
>
> First test:
>
>
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>
>
> Now same test without Stripe but with sharding. This results are same when
> I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>
>
> Now, If we do this test twice at the same time (two dd or two doewnload at
> the same time) it goes below 25/mb each or slower.
>
>
>
> I thought sharding is at least equal or a little slower (maybe?) but these
> results are terribly slow.
>
>
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
> 4TB each.
>
>
>
> Is there any tweak/tuning out there to make it fast?
>
>
>
> Or is this an expected behavior? If its, It is unacceptable. So slow. I
> cannot use this on production as it is terribly slow.
>
>
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>
>
> Thanks,
>
> Gencer.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170704/80cf4d35/attachment.html>


More information about the Gluster-users mailing list