[Gluster-users] Very slow performance on Sharded GlusterFS

Wed Jul 12 07:19:39 UTC 2017

Hi,

Sorry for the late response.
No, so eager-lock experiment was more to see if the implementation had any
new bugs.
It doesn't look like it does. I think having it on would be the right thing
to do. It will reduce the number of fops having to go over the network.

Coming to the performance drop, I compared the volume profile output for
stripe and 32MB shard again.
The only thing that is striking is the number of xattrops and inodelks,
which is only 2-4 for striped volume
whereas the number is much bigger in the case of sharded volume. This is
unfortunately likely with sharding because
the optimizations eager-locking and delayed post-op will now only be
applicable on a per-shard basis.
Larger the shard size, the better, to work around this issue.

Meanwhile, let me think about how we can get this fixed in code.

-Krutika

On Mon, Jul 10, 2017 at 7:59 PM, <gencer at gencgiyen.com> wrote:

> Hi Krutika,
>
>
>
> May I kindly ping to you and ask that If you have any idea yet or figured
> out whats the issue may?
>
>
>
> I am awaiting your reply with four eyes :)
>
>
>
> Apologies for the ping :)
>
>
>
> -Gencer.
>
>
>
> *From:* gluster-users-bounces at gluster.org [mailto:gluster-users-bounces@
> gluster.org] *On Behalf Of *gencer at gencgiyen.com
> *Sent:* Thursday, July 6, 2017 11:06 AM
>
> *To:* 'Krutika Dhananjay' <kdhananj at redhat.com>
> *Cc:* 'gluster-user' <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Hi Krutika,
>
>
>
> I also did one more test. I re-created another volume (single volume. Old
> one destroyed-deleted) then do 2 dd tests. One for 1GB other for 2GB. Both
> are 32MB shard and eager-lock off.
>
>
>
> Samples:
>
>
>
> sr:~# gluster volume profile testvol start
>
> Starting volume profile on testvol has been successful
>
> sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=1
>
> 1+0 records in
>
> 1+0 records out
>
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2708 s, 87.5 MB/s
>
> sr:~# gluster volume profile testvol info > /32mb_shard_and_1gb_dd.log
>
> sr:~# gluster volume profile testvol stop
>
> Stopping volume profile on testvol has been successful
>
> sr:~# gluster volume profile testvol start
>
> Starting volume profile on testvol has been successful
>
> sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=2
>
> 2+0 records in
>
> 2+0 records out
>
> 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 23.5457 s, 91.2 MB/s
>
> sr:~# gluster volume profile testvol info > /32mb_shard_and_2gb_dd.log
>
> sr:~# gluster volume profile testvol stop
>
> Stopping volume profile on testvol has been successful
>
>
>
> Also here is volume info:
>
>
>
> sr:~# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 3cc06d95-06e9-41f8-8b26-e997886d7ba1
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick4: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick6: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick8: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick10: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick11: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick13: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick15: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick17: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick19: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> cluster.eager-lock: off
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> See attached results and sorry for the multiple e-mails. I just want to
> make sure that I provided correct results for the tests.
>
>
>
> Thanks,
>
> Gencer.
>
>
>
> *From:* gluster-users-bounces at gluster.org [mailto:gluster-users-bounces@
> gluster.org <gluster-users-bounces at gluster.org>] *On Behalf Of *
> gencer at gencgiyen.com
> *Sent:* Thursday, July 6, 2017 10:34 AM
> *To:* 'Krutika Dhananjay' <kdhananj at redhat.com>
> *Cc:* 'gluster-user' <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Krutika, I’m sorry I forgot to add logs. I attached them now.
>
>
>
> Thanks,
>
> Gencer.
>
>
>
>
>
>
>
> *From:* gluster-users-bounces at gluster.org [mailto:gluster-users-bounces@
> gluster.org <gluster-users-bounces at gluster.org>] *On Behalf Of *
> gencer at gencgiyen.com
> *Sent:* Thursday, July 6, 2017 10:27 AM
> *To:* 'Krutika Dhananjay' <kdhananj at redhat.com>
> *Cc:* 'gluster-user' <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Ki Krutika,
>
>
>
> After that setting:
>
>
>
> $ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1
>
> 1+0 records in
>
> 1+0 records out
>
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s
>
>
>
> $ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1
>
> 0+1 records in
>
> 0+1 records out
>
> 2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s
>
>
>
> $ dd if=/dev/zero of=/mnt/ddfile3  bs=1G count=1
>
> 1+0 records in
>
> 1+0 records out
>
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s
>
>
>
> $ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2
>
> 2+0 records in
>
> 2+0 records out
>
> 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s
>
>
>
> I see improvements (from 70-75mb to 90-100mb per second) after eager-lock
> off setting. Also, I monitoring the bandwidth between two nodes. I see up
> to 102MB/s.
>
>
>
> Is there anything I can do to optimize more? Or is it last stop?
>
>
>
> Note: I deleted all files again and reformat then re-create volume with
> shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results
> are equal.
>
>
>
> Thanks,
>
> Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com
> <kdhananj at redhat.com>]
> *Sent:* Thursday, July 6, 2017 3:30 AM
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> What if you disabled eager lock and run your test again on the sharded
> configuration along with the profile output?
>
> # gluster volume set <VOL> cluster.eager-lock off
>
> -Krutika
>
>
>
> On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
> Thanks. I think reusing the same volume was the cause of lack of IO
> distribution.
>
> The latest profile output looks much more realistic and in line with i
> would expect.
>
> Let me analyse the numbers a bit and get back.
>
>
>
> -Krutika
>
>
>
> On Tue, Jul 4, 2017 at 12:55 PM, <gencer at gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Thank you so much for myour reply. Let me answer all:
>
>
>
>    1. I have no idea why it did not get distributed over all bricks.
>    2. Hm.. This is really weird.
>
>
>
> And others;
>
>
>
> No. I use only one volume. When I tested sharded and striped volumes, I
> manually stopped volume, deleted volume, purged data (data inside of
> bricks/disks) and re-create by using this command:
>
>
>
> sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1
> sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2
> sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3
> sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4
> sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5
> sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6
> sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7
> sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8
> sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9
> sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10
> sr-10-loc-50-14-18:/bricks/brick10 force
>
>
>
> and of course after that volume start executed. If shard enabled, I enable
> that feature BEFORE I start the sharded volume than mount.
>
>
>
> I tried converting from one to another but then I saw documentation says
> clean voluje should be better. So I tried clean method. Still same
> performance.
>
>
>
> Testfile grows from 1GB to 5GB. And tests are dd. See this example:
>
>
>
> dd if=/dev/zero of=/mnt/testfile bs=1G count=5
>
> 5+0 records in
>
> 5+0 records out
>
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
>
>
>
>
>
> >> dd if=/dev/zero of=/mnt/testfile bs=5G count=1
>
> This also gives same result. (bs and count reversed)
>
>
>
>
>
> And this example have generated a profile which I also attached to this
> e-mail.
>
>
>
> Is there anything that I can try? I am open to all kind of suggestions.
>
>
>
> Thanks,
>
> Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Tuesday, July 4, 2017 9:39 AM
>
>
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Hi Gencer,
>
> I just checked the volume-profile attachments.
>
> Things that seem really odd to me as far as the sharded volume is
> concerned:
>
> 1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
> seems to have witnessed all the IO. No other bricks witnessed any write
> operations. This is unacceptable for a volume that has 8 other replica
> sets. Why didn't the shards get distributed across all of these sets?
>
>
>
> 2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
> brick 5 is spending 99% of its time in FINODELK fop, when the fop that
> should have dominated its profile should have been in fact WRITE.
>
> Could you throw some more light on your setup from gluster standpoint?
> * For instance, are you using two different gluster volumes to gather
> these numbers - one distributed-replicated-striped and another
> distributed-replicated-sharded? Or are you merely converting a single
> volume from one type to another?
>
>
>
> * And if there are indeed two volumes, could you share both their `volume
> info` outputs to eliminate any confusion?
>
> * If there's just one volume, are you taking care to remove all data from
> the mount point of this volume before converting it?
>
> * What is the size the test file grew to?
>
> * These attached profiles are against dd runs? Or the file download test?
>
>
>
> -Krutika
>
>
>
>
>
> On Mon, Jul 3, 2017 at 8:42 PM, <gencer at gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Have you be able to look out my profiles? Do you have any clue, idea or
> suggestion?
>
>
>
> Thanks,
>
> -Gencer
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Friday, June 30, 2017 3:50 PM
>
>
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Just noticed that the way you have configured your brick order during
> volume-create makes both replicas of every set reside on the same machine.
>
> That apart, do you see any difference if you change shard-block-size to
> 512MB? Could you try that?
>
> If it doesn't help, could you share the volume-profile output for both the
> tests (separate)?
>
> Here's what you do:
>
> 1. Start profile before starting your test - it could be dd or it could be
> file download.
>
> # gluster volume profile <VOL> start
>
> 2. Run your test - again either dd or file-download.
>
> 3. Once the test has completed, run `gluster volume profile <VOL> info`
> and redirect its output to a tmp file.
>
> 4. Stop profile
>
> # gluster volume profile <VOL> stop
>
> And attach the volume-profile output file that you saved at a temporary
> location in step 3.
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 5:33 PM, <gencer at gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Sure, here is volume info:
>
>
>
> root at sr-09-loc-50-14-18:/# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick4: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick6: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick8: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick10: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick11: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick13: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick15: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick17: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick19: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> -Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhananj at redhat.com]
> *Sent:* Friday, June 30, 2017 2:50 PM
> *To:* gencer at gencgiyen.com
> *Cc:* gluster-user <gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Could you please provide the volume-info output?
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 4:23 PM, <gencer at gencgiyen.com> wrote:
>
> Hi,
>
>
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>
>
> First test:
>
>
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>
>
> Now same test without Stripe but with sharding. This results are same when
> I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>
>
> Now, If we do this test twice at the same time (two dd or two doewnload at
> the same time) it goes below 25/mb each or slower.
>
>
>
> I thought sharding is at least equal or a little slower (maybe?) but these
> results are terribly slow.
>
>
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
> 4TB each.
>
>
>
> Is there any tweak/tuning out there to make it fast?
>
>
>
> Or is this an expected behavior? If its, It is unacceptable. So slow. I
> cannot use this on production as it is terribly slow.
>
>
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>
>
> Thanks,
>
> Gencer.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170712/05b6309e/attachment.html>