[Gluster-devel] Fw: Re: Corvid gluster testing

Tue Aug 5 18:44:29 UTC 2014

Forgot to attach profile info in previous email.  Attached...

David

------ Original Message ------
From: "David F. Robinson" <david.robinson at corvidtec.com>
To: gluster-devel at gluster.org
Sent: 8/5/2014 2:41:34 PM
Subject: Fw: Re: Corvid gluster testing

>I have been testing some of the fixes that Pranith incorporated into 
>the 3.5.2-beta to see how they performed for moderate levels of i/o. 
>All of the stability issues that I had seen in previous versions seem 
>to have been fixed in 3.5.2; however, there still seem to be some 
>significant performance issues.  Pranith suggested that I send this to 
>the gluster-devel email list, so here goes:
>
>I am running an MPI job that saves a restart file to the gluster file 
>system.  When I use the following in my fstab to mount the gluster 
>volume, the i/o time for the 2.5GB file is roughly 45-seconds.
>
>     gfsib01a.corvidtec.com:/homegfs /homegfs glusterfs 
>transport=tcp,_netdev 0 0
>When I switch this to use the NFS protocol (see below), the i/o time is 
>2.5-seconds.
>
>   gfsib01a.corvidtec.com:/homegfs /homegfs nfs 
>vers=3,intr,bg,rsize=32768,wsize=32768 0 0
>
>The read-times for gluster are 10-20% faster than NFS, but the write 
>times are almost 20x slower.
>
>I am running SL 6.4 and glusterfs-3.5.2-0.1.beta1.el6.x86_64...
>
>[root at gfs01a glusterfs]# gluster volume info homegfs
>Volume Name: homegfs
>Type: Distributed-Replicate
>Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>Status: Started
>Number of Bricks: 2 x 2 = 4
>Transport-type: tcp
>Bricks:
>Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>
>David
>
>------ Forwarded Message ------
>From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>To: "David Robinson" <david.robinson at corvidtec.com>
>Cc: "Young Thomas" <tom.young at corvidtec.com>
>Sent: 8/5/2014 2:25:38 AM
>Subject: Re: Corvid gluster testing
>
>gluster-devel at gluster.org is the email-id for the mailing list. We 
>should probably start with the initial run numbers and the comparison 
>for glusterfs mount and nfs mounts. May be something like
>
>glusterfs mount: 90 minutes
>nfs mount: 25 minutes
>
>And profile outputs, volume config, number of mounts, hardware 
>configuration should be a good start.
>
>Pranith
>
>On 08/05/2014 09:28 AM, David Robinson wrote:
>>Thanks pranith
>>
>>
>>===============================
>>David F. Robinson, Ph.D.
>>President - Corvid Technologies
>>704.799.6944 x101 [office]
>>704.252.1310 [cell]
>>704.799.7974 [fax]
>>David.Robinson at corvidtec.com
>>http://www.corvidtechnologies.com
>>
>>>On Aug 4, 2014, at 11:22 PM, Pranith Kumar Karampuri 
>>><pkarampu at redhat.com> wrote:
>>>
>>>
>>>>On 08/05/2014 08:33 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>On 08/05/2014 08:29 AM, David F. Robinson wrote:
>>>>>>>On 08/05/2014 12:51 AM, David F. Robinson wrote:
>>>>>>>No. I don't want to use nfs. It eliminates most of the benefits 
>>>>>>>of why I want to use gluster. Failover redundancy of the pair, 
>>>>>>>load balancing, etc.
>>>>>>What is the meaning of 'Failover redundancy of the pair, load 
>>>>>>balancing ' Could you elaborate more? smb/nfs/glusterfs are just 
>>>>>>access protocols that gluster supports functionality is almost 
>>>>>>same
>>>>>Here is my understanding. Please correct me where I am wrong.
>>>>>
>>>>>With gluster, if I am doing a write and one of the replicated pairs 
>>>>>goes down, there is no interruption to the I/o. The failover is 
>>>>>handled by gluster and the fuse client. This isn't done if I use an 
>>>>>nfs mount unless the component of the pair that goes down isn't the 
>>>>>one I used for the mount.
>>>>>
>>>>>With nfs, I will have to mount one of the bricks. So, if I have 
>>>>>gfs01a, gfs01b, gfs02a, gfs02b, gfs03a, gfs03b, etc and my fstab 
>>>>>mounts gfs01a, it is my understanding that all of my I/o will go 
>>>>>through gfs01a which then gets distributed to all of the other 
>>>>>bricks. Gfs01a throughput becomes a bottleneck. Where if I do a 
>>>>>gluster mount using fuse, the load balancing is handled at the 
>>>>>client side , not the server side. If I have 1000-nodes accessing 
>>>>>20-gluster bricks, I need the load balancing aspect. I cannot have 
>>>>>all traffic going through the network interface on a single brick.
>>>>>
>>>>>If I am wrong with the above assumptions, I guess my question is 
>>>>>why would one ever use the gluster mount instead of nfs and/or 
>>>>>samba?
>>>>>
>>>>>Tom: feel free to chime in if I have missed anything.
>>>>I see your point now. Yes the gluster server where you did the mount 
>>>>is kind of a bottle neck.
>>>Now that we established the problem is in the clients/protocols, you 
>>>should send out a detailed mail on gluster-devel and see if anyone 
>>>can help with you on performance xlators that can improve it a bit 
>>>more. My area of expertise is more on replication. I am 
>>>sub-maintainer for replication,locks components. I also know 
>>>connection management/io-threads related issues which lead to hangs 
>>>as I worked on them before. Performance xlators are black box to me.
>>>
>>>Performance xlators are enabled only on fuse gluster stack. On nfs 
>>>server mounts we disable all the performance xlators except 
>>>write-behind as nfs client does lots of things for improving 
>>>performance. I suggest you guys follow up more on gluster-devel.
>>>
>>>Appreciate all the help you did for improving the product :-). Thanks 
>>>a ton!
>>>Pranith
>>>>Pranith
>>>>>David (Sent from mobile)
>>>>>
>>>>>===============================
>>>>>David F. Robinson, Ph.D.
>>>>>President - Corvid Technologies
>>>>>704.799.6944 x101 [office]
>>>>>704.252.1310 [cell]
>>>>>704.799.7974 [fax]
>>>>>David.Robinson at corvidtec.com
>>>>>http://www.corvidtechnologies.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140805/d5410e4e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: homegfs_profile.dat
Type: application/octet-stream
Size: 20218 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140805/d5410e4e/attachment-0001.obj>