[Gluster-devel] Fw: Re: Corvid gluster testing
David F. Robinson
david.robinson at corvidtec.com
Thu Aug 7 02:11:59 UTC 2014
Just to clarify a little, there are two cases where I was evaluating
performance.
1) The first case that Pranith was working involved 20-nodes with
4-processors on each node for a total of 80-processors. Each processor
does its own independent i/o. These files are roughly 100-200MB each
and there are several hundred of them. When I mounted the gluster
system using fuse, it took 1.5-hours to do the i/o. When I mounted the
same system using NFS, it took 30-minutes. Note, that in order to get
the gluster mounted file-system down to 1.5-hours, I had to get rid of
the replicated volume (this was done during troubleshooting with Pranith
to rule out other possible issues). The timing was significantly worse
(3+ hours) when I was using a replicated pair.
2) The second case was the output of a larger single file (roughly
2.5TB). For this case, it takes the gluster mounted filesystem
60-seconds (although I got that down to 52-seconds with some gluster
parameter tuning). The NFS mount takes 38-seconds. I sent the results
of this to the developer list first as this case is much easier to test
(50-seconds versus what could be 3+ hours).
I am head out of town for a few days and will not be able to do
additional testing until Monday. For the second case, I will turn off
cluster.eager-lock and send the results to the email list. If there is
any other testing that you would like to see for the first case, let me
know and I will be happy to perform the tests and send in the results...
Sorry for the confusion...
David
------ Original Message ------
From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
To: "Anand Avati" <avati at gluster.org>
Cc: "David F. Robinson" <david.robinson at corvidtec.com>; "Gluster Devel"
<gluster-devel at gluster.org>
Sent: 8/6/2014 9:51:11 PM
Subject: Re: [Gluster-devel] Fw: Re: Corvid gluster testing
>
>On 08/07/2014 07:18 AM, Anand Avati wrote:
>>It would be worth checking the perf numbers without -o acl (in case it
>>was enabled, as seen in the other gid thread). Client side -o acl
>>mount option can have a negative impact on performance because of the
>>increased number of up-calls from FUSE for access().
>Actually it is all write intensive.
>here are the numbers they gave me from earlier runs:
> %-latency Avg-latency Min-Latency Max-Latency No. of calls
> Fop
> --------- ----------- ----------- ----------- ------------
> ----
> 0.00 0.00 us 0.00 us 0.00 us 99
> FORGET
> 0.00 0.00 us 0.00 us 0.00 us 1093
>RELEASE
> 0.00 0.00 us 0.00 us 0.00 us 468
>RELEASEDIR
> 0.00 60.00 us 26.00 us 107.00 us 4
>SETATTR
> 0.00 91.56 us 42.00 us 157.00 us 27
> UNLINK
> 0.00 20.75 us 12.00 us 55.00 us 132
>GETXATTR
> 0.00 19.03 us 9.00 us 95.00 us 152
>READLINK
> 0.00 43.19 us 12.00 us 106.00 us 83
> OPEN
> 0.00 18.37 us 8.00 us 92.00 us 257
> STATFS
> 0.00 32.42 us 11.00 us 118.00 us 322
>OPENDIR
> 0.00 36.09 us 5.00 us 109.00 us 359
> FSTAT
> 0.00 51.14 us 37.00 us 183.00 us 663
> RENAME
> 0.00 33.32 us 6.00 us 123.00 us 1451
> STAT
> 0.00 821.79 us 21.00 us 22678.00 us 84
> READ
> 0.00 34.88 us 3.00 us 139.00 us 2326
> FLUSH
> 0.01 789.33 us 72.00 us 64054.00 us 347
> CREATE
> 0.01 1144.63 us 43.00 us 280735.00 us 337
>FTRUNCATE
> 0.01 47.82 us 16.00 us 19817.00 us 16513
> LOOKUP
> 0.02 604.85 us 11.00 us 1233.00 us 1423
>READDIRP
> 99.95 17.51 us 6.00 us 212701.00 us 300715967
> WRITE
>
> Duration: 5390 seconds
> Data Read: 1495257497 bytes
>Data Written: 166546887668 bytes
>
>Pranith
>>
>>Thanks
>>
>>
>>On Wed, Aug 6, 2014 at 6:26 PM, Pranith Kumar Karampuri
>><pkarampu at redhat.com> wrote:
>>>
>>>On 08/07/2014 06:48 AM, Anand Avati wrote:
>>>>
>>>>
>>>>
>>>>On Wed, Aug 6, 2014 at 6:05 PM, Pranith Kumar Karampuri
>>>><pkarampu at redhat.com> wrote:
>>>>>We checked this performance with plain distribute as well and on
>>>>>nfs it gave 25 minutes where as on nfs it gave around 90 minutes
>>>>>after disabling throttling in both situations.
>>>>
>>>>This sentence is very confusing. Can you please state it more
>>>>clearly?
>>>sorry :-D.
>>>We checked this performance on plain distribute volume by disabling
>>>throttling.
>>>On nfs the run took 25 minutes.
>>>On fuse the run took 90 minutes.
>>>
>>>Pranith
>>>
>>>>
>>>>Thanks
>>>>
>>>>
>>>>>I was wondering if any of you guys know what could contribute to
>>>>>this difference.
>>>>>
>>>>>Pranith
>>>>>
>>>>>On 08/07/2014 01:33 AM, Anand Avati wrote:
>>>>>>Seems like heavy FINODELK contention. As a diagnostic step, can
>>>>>>you try disabling eager-locking and check the write performance
>>>>>>again (gluster volume set $name cluster.eager-lock off)?
>>>>>>
>>>>>>
>>>>>>On Tue, Aug 5, 2014 at 11:44 AM, David F. Robinson
>>>>>><david.robinson at corvidtec.com> wrote:
>>>>>>>Forgot to attach profile info in previous email. Attached...
>>>>>>>
>>>>>>>David
>>>>>>>
>>>>>>>
>>>>>>>------ Original Message ------
>>>>>>>From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>>>>To: gluster-devel at gluster.org
>>>>>>>Sent: 8/5/2014 2:41:34 PM
>>>>>>>Subject: Fw: Re: Corvid gluster testing
>>>>>>>
>>>>>>>>I have been testing some of the fixes that Pranith incorporated
>>>>>>>>into the 3.5.2-beta to see how they performed for moderate
>>>>>>>>levels of i/o. All of the stability issues that I had seen in
>>>>>>>>previous versions seem to have been fixed in 3.5.2; however,
>>>>>>>>there still seem to be some significant performance issues.
>>>>>>>>Pranith suggested that I send this to the gluster-devel email
>>>>>>>>list, so here goes:
>>>>>>>>
>>>>>>>>I am running an MPI job that saves a restart file to the gluster
>>>>>>>>file system. When I use the following in my fstab to mount the
>>>>>>>>gluster volume, the i/o time for the 2.5GB file is roughly
>>>>>>>>45-seconds.
>>>>>>>>
>>>>>>>> gfsib01a.corvidtec.com:/homegfs /homegfs glusterfs
>>>>>>>>transport=tcp,_netdev 0 0
>>>>>>>>When I switch this to use the NFS protocol (see below), the i/o
>>>>>>>>time is 2.5-seconds.
>>>>>>>>
>>>>>>>> gfsib01a.corvidtec.com:/homegfs /homegfs nfs
>>>>>>>>vers=3,intr,bg,rsize=32768,wsize=32768 0 0
>>>>>>>>
>>>>>>>>The read-times for gluster are 10-20% faster than NFS, but the
>>>>>>>>write times are almost 20x slower.
>>>>>>>>
>>>>>>>>I am running SL 6.4 and glusterfs-3.5.2-0.1.beta1.el6.x86_64...
>>>>>>>>
>>>>>>>>[root at gfs01a glusterfs]# gluster volume info homegfs
>>>>>>>>Volume Name: homegfs
>>>>>>>>Type: Distributed-Replicate
>>>>>>>>Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>>>>>>>>Status: Started
>>>>>>>>Number of Bricks: 2 x 2 = 4
>>>>>>>>Transport-type: tcp
>>>>>>>>Bricks:
>>>>>>>>Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>>>>>>>Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>>>>>>>Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>>>>>>>Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>>>>>>>
>>>>>>>>David
>>>>>>>>
>>>>>>>>------ Forwarded Message ------
>>>>>>>>From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>>>>To: "David Robinson" <david.robinson at corvidtec.com>
>>>>>>>>Cc: "Young Thomas" <tom.young at corvidtec.com>
>>>>>>>>Sent: 8/5/2014 2:25:38 AM
>>>>>>>>Subject: Re: Corvid gluster testing
>>>>>>>>
>>>>>>>>gluster-devel at gluster.org is the email-id for the mailing list.
>>>>>>>>We should probably start with the initial run numbers and the
>>>>>>>>comparison for glusterfs mount and nfs mounts. May be something
>>>>>>>>like
>>>>>>>>
>>>>>>>>glusterfs mount: 90 minutes
>>>>>>>>nfs mount: 25 minutes
>>>>>>>>
>>>>>>>>And profile outputs, volume config, number of mounts, hardware
>>>>>>>>configuration should be a good start.
>>>>>>>>
>>>>>>>>Pranith
>>>>>>>>
>>>>>>>>On 08/05/2014 09:28 AM, David Robinson wrote:
>>>>>>>>>Thanks pranith
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>===============================
>>>>>>>>>David F. Robinson, Ph.D.
>>>>>>>>>President - Corvid Technologies
>>>>>>>>>704.799.6944 x101 [office]
>>>>>>>>>704.252.1310 [cell]
>>>>>>>>>704.799.7974 [fax]
>>>>>>>>>David.Robinson at corvidtec.com
>>>>>>>>>http://www.corvidtechnologies.com/
>>>>>>>>>
>>>>>>>>>>On Aug 4, 2014, at 11:22 PM, Pranith Kumar Karampuri
>>>>>>>>>><pkarampu at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>On 08/05/2014 08:33 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>>>>
>>>>>>>>>>>On 08/05/2014 08:29 AM, David F. Robinson wrote:
>>>>>>>>>>>>>>On 08/05/2014 12:51 AM, David F. Robinson wrote:
>>>>>>>>>>>>>>No. I don't want to use nfs. It eliminates most of the
>>>>>>>>>>>>>>benefits of why I want to use gluster. Failover redundancy
>>>>>>>>>>>>>>of the pair, load balancing, etc.
>>>>>>>>>>>>>What is the meaning of 'Failover redundancy of the pair,
>>>>>>>>>>>>>load balancing ' Could you elaborate more?
>>>>>>>>>>>>>smb/nfs/glusterfs are just access protocols that gluster
>>>>>>>>>>>>>supports functionality is almost same
>>>>>>>>>>>>Here is my understanding. Please correct me where I am
>>>>>>>>>>>>wrong.
>>>>>>>>>>>>
>>>>>>>>>>>>With gluster, if I am doing a write and one of the
>>>>>>>>>>>>replicated pairs goes down, there is no interruption to the
>>>>>>>>>>>>I/o. The failover is handled by gluster and the fuse client.
>>>>>>>>>>>>This isn't done if I use an nfs mount unless the component
>>>>>>>>>>>>of the pair that goes down isn't the one I used for the
>>>>>>>>>>>>mount.
>>>>>>>>>>>>
>>>>>>>>>>>>With nfs, I will have to mount one of the bricks. So, if I
>>>>>>>>>>>>have gfs01a, gfs01b, gfs02a, gfs02b, gfs03a, gfs03b, etc and
>>>>>>>>>>>>my fstab mounts gfs01a, it is my understanding that all of
>>>>>>>>>>>>my I/o will go through gfs01a which then gets distributed to
>>>>>>>>>>>>all of the other bricks. Gfs01a throughput becomes a
>>>>>>>>>>>>bottleneck. Where if I do a gluster mount using fuse, the
>>>>>>>>>>>>load balancing is handled at the client side , not the
>>>>>>>>>>>>server side. If I have 1000-nodes accessing 20-gluster
>>>>>>>>>>>>bricks, I need the load balancing aspect. I cannot have all
>>>>>>>>>>>>traffic going through the network interface on a single
>>>>>>>>>>>>brick.
>>>>>>>>>>>>
>>>>>>>>>>>>If I am wrong with the above assumptions, I guess my
>>>>>>>>>>>>question is why would one ever use the gluster mount instead
>>>>>>>>>>>>of nfs and/or samba?
>>>>>>>>>>>>
>>>>>>>>>>>>Tom: feel free to chime in if I have missed anything.
>>>>>>>>>>>I see your point now. Yes the gluster server where you did
>>>>>>>>>>>the mount is kind of a bottle neck.
>>>>>>>>>>Now that we established the problem is in the
>>>>>>>>>>clients/protocols, you should send out a detailed mail on
>>>>>>>>>>gluster-devel and see if anyone can help with you on
>>>>>>>>>>performance xlators that can improve it a bit more. My area of
>>>>>>>>>>expertise is more on replication. I am sub-maintainer for
>>>>>>>>>>replication,locks components. I also know connection
>>>>>>>>>>management/io-threads related issues which lead to hangs as I
>>>>>>>>>>worked on them before. Performance xlators are black box to
>>>>>>>>>>me.
>>>>>>>>>>
>>>>>>>>>>Performance xlators are enabled only on fuse gluster stack. On
>>>>>>>>>>nfs server mounts we disable all the performance xlators
>>>>>>>>>>except write-behind as nfs client does lots of things for
>>>>>>>>>>improving performance. I suggest you guys follow up more on
>>>>>>>>>>gluster-devel.
>>>>>>>>>>
>>>>>>>>>>Appreciate all the help you did for improving the product :-).
>>>>>>>>>>Thanks a ton!
>>>>>>>>>>Pranith
>>>>>>>>>>>Pranith
>>>>>>>>>>>>David (Sent from mobile)
>>>>>>>>>>>>
>>>>>>>>>>>>===============================
>>>>>>>>>>>>David F. Robinson, Ph.D.
>>>>>>>>>>>>President - Corvid Technologies
>>>>>>>>>>>>704.799.6944 x101 [office]
>>>>>>>>>>>>704.252.1310 [cell]
>>>>>>>>>>>>704.799.7974 [fax]
>>>>>>>>>>>>David.Robinson at corvidtec.com
>>>>>>>>>>>>http://www.corvidtechnologies.com/
>>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Gluster-devel mailing list
>>>>>>>Gluster-devel at gluster.org
>>>>>>>http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>_______________________________________________ Gluster-devel
>>>>>>mailing list
>>>>>>Gluster-devel at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140807/95b15901/attachment-0001.html>
More information about the Gluster-devel
mailing list