[Gluster-users] Very slow performance on Sharded GlusterFS
    gencer at gencgiyen.com 
    gencer at gencgiyen.com
       
    Thu Jul  6 07:33:35 UTC 2017
    
    
  
Krutika, I’m sorry I forgot to add logs. I attached them now.
 
Thanks,
Gencer.
 
 
 
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of gencer at gencgiyen.com
Sent: Thursday, July 6, 2017 10:27 AM
To: 'Krutika Dhananjay' <kdhananj at redhat.com>
Cc: 'gluster-user' <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS
 
Ki Krutika,
 
After that setting:
 
$ dd if=/dev/zero of=/mnt/ddfile bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.7351 s, 91.5 MB/s
 
$ dd if=/dev/zero of=/mnt/ddfile2 bs=2G count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 23.7351 s, 90.5 MB/s
 
$ dd if=/dev/zero of=/mnt/ddfile3  bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1202 s, 88.6 MB/s
 
$ dd if=/dev/zero of=/mnt/ddfile4 bs=1G count=2
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 24.7695 s, 86.7 MB/s
 
I see improvements (from 70-75mb to 90-100mb per second) after eager-lock off setting. Also, I monitoring the bandwidth between two nodes. I see up to 102MB/s.
 
Is there anything I can do to optimize more? Or is it last stop?
 
Note: I deleted all files again and reformat then re-create volume with shard then mount it. Tried with 16MB, 32MB and 512MB shard sizes. Results are equal.
 
Thanks,
Gencer.
 
From: Krutika Dhananjay [mailto:kdhananj at redhat.com] 
Sent: Thursday, July 6, 2017 3:30 AM
To: gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> 
Cc: gluster-user <gluster-users at gluster.org <mailto:gluster-users at gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS
 
What if you disabled eager lock and run your test again on the sharded configuration along with the profile output?
# gluster volume set <VOL> cluster.eager-lock off
-Krutika
 
On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <kdhananj at redhat.com <mailto:kdhananj at redhat.com> > wrote:
Thanks. I think reusing the same volume was the cause of lack of IO distribution.
The latest profile output looks much more realistic and in line with i would expect.
Let me analyse the numbers a bit and get back.
 
-Krutika
 
On Tue, Jul 4, 2017 at 12:55 PM, <gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> > wrote:
Hi Krutika,
 
Thank you so much for myour reply. Let me answer all:
 
1.	I have no idea why it did not get distributed over all bricks.
2.	Hm.. This is really weird.
 
And others;
 
No. I use only one volume. When I tested sharded and striped volumes, I manually stopped volume, deleted volume, purged data (data inside of bricks/disks) and re-create by using this command:
 
sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10 force
 
and of course after that volume start executed. If shard enabled, I enable that feature BEFORE I start the sharded volume than mount.
 
I tried converting from one to another but then I saw documentation says clean voluje should be better. So I tried clean method. Still same performance.
 
Testfile grows from 1GB to 5GB. And tests are dd. See this example:
 
dd if=/dev/zero of=/mnt/testfile bs=1G count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
 
 
>> dd if=/dev/zero of=/mnt/testfile bs=5G count=1
This also gives same result. (bs and count reversed)
 
 
And this example have generated a profile which I also attached to this e-mail.
 
Is there anything that I can try? I am open to all kind of suggestions.
 
Thanks,
Gencer.
 
From: Krutika Dhananjay [mailto:kdhananj at redhat.com <mailto:kdhananj at redhat.com> ] 
Sent: Tuesday, July 4, 2017 9:39 AM
To: gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> 
Cc: gluster-user <gluster-users at gluster.org <mailto:gluster-users at gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS
 
Hi Gencer,
I just checked the volume-profile attachments.
Things that seem really odd to me as far as the sharded volume is concerned:
1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10 seems to have witnessed all the IO. No other bricks witnessed any write operations. This is unacceptable for a volume that has 8 other replica sets. Why didn't the shards get distributed across all of these sets?
 
2. For replica set consisting of bricks 5 and 6 of node 09, I see that the brick 5 is spending 99% of its time in FINODELK fop, when the fop that should have dominated its profile should have been in fact WRITE.
Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these numbers - one distributed-replicated-striped and another distributed-replicated-sharded? Or are you merely converting a single volume from one type to another?
 
* And if there are indeed two volumes, could you share both their `volume info` outputs to eliminate any confusion?
* If there's just one volume, are you taking care to remove all data from the mount point of this volume before converting it?
* What is the size the test file grew to?
* These attached profiles are against dd runs? Or the file download test?
 
-Krutika
 
 
On Mon, Jul 3, 2017 at 8:42 PM, <gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> > wrote:
Hi Krutika,
 
Have you be able to look out my profiles? Do you have any clue, idea or suggestion?
 
Thanks,
-Gencer
 
From: Krutika Dhananjay [mailto:kdhananj at redhat.com <mailto:kdhananj at redhat.com> ] 
Sent: Friday, June 30, 2017 3:50 PM
To: gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> 
Cc: gluster-user <gluster-users at gluster.org <mailto:gluster-users at gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS
 
Just noticed that the way you have configured your brick order during volume-create makes both replicas of every set reside on the same machine.
That apart, do you see any difference if you change shard-block-size to 512MB? Could you try that?
If it doesn't help, could you share the volume-profile output for both the tests (separate)?
Here's what you do:
1. Start profile before starting your test - it could be dd or it could be file download.
# gluster volume profile <VOL> start
2. Run your test - again either dd or file-download.
3. Once the test has completed, run `gluster volume profile <VOL> info` and redirect its output to a tmp file.
4. Stop profile
# gluster volume profile <VOL> stop
And attach the volume-profile output file that you saved at a temporary location in step 3.
-Krutika
 
On Fri, Jun 30, 2017 at 5:33 PM, <gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> > wrote:
Hi Krutika,
 
Sure, here is volume info:
 
root at sr-09-loc-50-14-18:/# gluster volume info testvol
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
Status: Started
Snapshot Count: 0
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Bricks:
Brick1: sr-09-loc-50-14-18:/bricks/brick1
Brick2: sr-09-loc-50-14-18:/bricks/brick2
Brick3: sr-09-loc-50-14-18:/bricks/brick3
Brick4: sr-09-loc-50-14-18:/bricks/brick4
Brick5: sr-09-loc-50-14-18:/bricks/brick5
Brick6: sr-09-loc-50-14-18:/bricks/brick6
Brick7: sr-09-loc-50-14-18:/bricks/brick7
Brick8: sr-09-loc-50-14-18:/bricks/brick8
Brick9: sr-09-loc-50-14-18:/bricks/brick9
Brick10: sr-09-loc-50-14-18:/bricks/brick10
Brick11: sr-10-loc-50-14-18:/bricks/brick1
Brick12: sr-10-loc-50-14-18:/bricks/brick2
Brick13: sr-10-loc-50-14-18:/bricks/brick3
Brick14: sr-10-loc-50-14-18:/bricks/brick4
Brick15: sr-10-loc-50-14-18:/bricks/brick5
Brick16: sr-10-loc-50-14-18:/bricks/brick6
Brick17: sr-10-loc-50-14-18:/bricks/brick7
Brick18: sr-10-loc-50-14-18:/bricks/brick8
Brick19: sr-10-loc-50-14-18:/bricks/brick9
Brick20: sr-10-loc-50-14-18:/bricks/brick10
Options Reconfigured:
features.shard-block-size: 32MB
features.shard: on
transport.address-family: inet
nfs.disable: on
 
-Gencer.
 
From: Krutika Dhananjay [mailto:kdhananj at redhat.com <mailto:kdhananj at redhat.com> ] 
Sent: Friday, June 30, 2017 2:50 PM
To: gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> 
Cc: gluster-user <gluster-users at gluster.org <mailto:gluster-users at gluster.org> >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS
 
Could you please provide the volume-info output?
-Krutika
 
On Fri, Jun 30, 2017 at 4:23 PM, <gencer at gencgiyen.com <mailto:gencer at gencgiyen.com> > wrote:
Hi,
 
I have an 2 nodes with 20 bricks in total (10+10).
 
First test: 
 
2 Nodes with Distributed – Striped – Replicated (2 x 2)
10GbE Speed between nodes
 
“dd” performance: 400mb/s and higher
Downloading a large file from internet and directly to the gluster: 250-300mb/s
 
Now same test without Stripe but with sharding. This results are same when I set shard size 4MB or 32MB. (Again 2x Replica here)
 
Dd performance: 70mb/s
Download directly to the gluster performance : 60mb/s
 
Now, If we do this test twice at the same time (two dd or two doewnload at the same time) it goes below 25/mb each or slower.
 
I thought sharding is at least equal or a little slower (maybe?) but these results are terribly slow.
 
I tried tuning (cache, window-size etc..). Nothing helps.
 
GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB each.
 
Is there any tweak/tuning out there to make it fast?
 
Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot use this on production as it is terribly slow. 
 
The reason behind I use shard instead of stripe is i would like to eleminate files that bigger than brick size.
 
Thanks,
Gencer.
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> 
http://lists.gluster.org/mailman/listinfo/gluster-users
 
 
 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170706/d763ef4c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shard_512mb.log
Type: application/octet-stream
Size: 25924 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170706/d763ef4c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shard_32mb_1gb_dd.log
Type: application/octet-stream
Size: 44590 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170706/d763ef4c/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shard_32mb_2gb_dd.log
Type: application/octet-stream
Size: 51012 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170706/d763ef4c/attachment-0002.obj>
    
    
More information about the Gluster-users
mailing list