[Gluster-devel] Fw: Re: Corvid gluster testing

Wed Aug 6 20:03:09 UTC 2014

Seems like heavy FINODELK contention. As a diagnostic step, can you try
disabling eager-locking and check the write performance again (gluster
volume set $name cluster.eager-lock off)?

On Tue, Aug 5, 2014 at 11:44 AM, David F. Robinson <
david.robinson at corvidtec.com> wrote:

>  Forgot to attach profile info in previous email.  Attached...
>
> David
>
>
> ------ Original Message ------
> From: "David F. Robinson" <david.robinson at corvidtec.com>
> To: gluster-devel at gluster.org
> Sent: 8/5/2014 2:41:34 PM
> Subject: Fw: Re: Corvid gluster testing
>
>
> I have been testing some of the fixes that Pranith incorporated into the
> 3.5.2-beta to see how they performed for moderate levels of i/o. All of the
> stability issues that I had seen in previous versions seem to have been
> fixed in 3.5.2; however, there still seem to be some significant
> performance issues.  Pranith suggested that I send this to the
> gluster-devel email list, so here goes:
>
> I am running an MPI job that saves a restart file to the gluster file
> system.  When I use the following in my fstab to mount the gluster volume,
> the i/o time for the 2.5GB file is roughly 45-seconds.
>
>
> *    gfsib01a.corvidtec.com:/homegfs /homegfs glusterfs
> transport=tcp,_netdev 0 0*
> When I switch this to use the NFS protocol (see below), the i/o time is
> 2.5-seconds.
>
> *  gfsib01a.corvidtec.com:/homegfs /homegfs nfs
> vers=3,intr,bg,rsize=32768,wsize=32768 0 0*
>
> The read-times for gluster are 10-20% faster than NFS, but the write times
> are almost 20x slower.
>
> I am running SL 6.4 and glusterfs-3.5.2-0.1.beta1.el6.x86_64...
>
>
>
>
>
>
>
>
>
>
>
>
> *[root at gfs01a glusterfs]# gluster volume info homegfsVolume Name:
> homegfsType: Distributed-ReplicateVolume ID:
> 1e32672a-f1b7-4b58-ba94-58c085e59071Status: StartedNumber of Bricks: 2 x 2
> = 4Transport-type: tcpBricks:Brick1:
> gfsib01a.corvidtec.com:/data/brick01a/homegfsBrick2:
> gfsib01b.corvidtec.com:/data/brick01b/homegfsBrick3:
> gfsib01a.corvidtec.com:/data/brick02a/homegfsBrick4:
> gfsib01b.corvidtec.com:/data/brick02b/homegfs*
>
> David
>
> ------ Forwarded Message ------
> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> To: "David Robinson" <david.robinson at corvidtec.com>
> Cc: "Young Thomas" <tom.young at corvidtec.com>
> Sent: 8/5/2014 2:25:38 AM
> Subject: Re: Corvid gluster testing
>
>  gluster-devel at gluster.org is the email-id for the mailing list. We
> should probably start with the initial run numbers and the comparison for
> glusterfs mount and nfs mounts. May be something like
>
> glusterfs mount: 90 minutes
> nfs mount: 25 minutes
>
> And profile outputs, volume config, number of mounts, hardware
> configuration should be a good start.
>
> Pranith
>
> On 08/05/2014 09:28 AM, David Robinson wrote:
>
> Thanks pranith
>
>
> ===============================
> David F. Robinson, Ph.D.
> President - Corvid Technologies
> 704.799.6944 x101 [office]
> 704.252.1310 [cell]
> 704.799.7974 [fax]
> David.Robinson at corvidtec.com
> http://www.corvidtechnologies.com
>
>
> On Aug 4, 2014, at 11:22 PM, Pranith Kumar Karampuri <pkarampu at redhat.com>
> wrote:
>
>
>
> On 08/05/2014 08:33 AM, Pranith Kumar Karampuri wrote:
>
> On 08/05/2014 08:29 AM, David F. Robinson wrote:
>
>  On 08/05/2014 12:51 AM, David F. Robinson wrote:
> No. I don't want to use nfs. It eliminates most of the benefits of why I
> want to use gluster. Failover redundancy of the pair, load balancing, etc.
>
> What is the meaning of 'Failover redundancy of the pair, load balancing '
> Could you elaborate more? smb/nfs/glusterfs are just access protocols that
> gluster supports functionality is almost same
>
> Here is my understanding. Please correct me where I am wrong.
>
> With gluster, if I am doing a write and one of the replicated pairs goes
> down, there is no interruption to the I/o. The failover is handled by
> gluster and the fuse client. This isn't done if I use an nfs mount unless
> the component of the pair that goes down isn't the one I used for the
> mount.
>
> With nfs, I will have to mount one of the bricks. So, if I have gfs01a,
> gfs01b, gfs02a, gfs02b, gfs03a, gfs03b, etc and my fstab mounts gfs01a, it
> is my understanding that all of my I/o will go through gfs01a which then
> gets distributed to all of the other bricks. Gfs01a throughput becomes a
> bottleneck. Where if I do a gluster mount using fuse, the load balancing is
> handled at the client side , not the server side. If I have 1000-nodes
> accessing 20-gluster bricks, I need the load balancing aspect. I cannot
> have all traffic going through the network interface on a single brick.
>
> If I am wrong with the above assumptions, I guess my question is why would
> one ever use the gluster mount instead of nfs and/or samba?
>
> Tom: feel free to chime in if I have missed anything.
>
> I see your point now. Yes the gluster server where you did the mount is
> kind of a bottle neck.
>
> Now that we established the problem is in the clients/protocols, you
> should send out a detailed mail on gluster-devel and see if anyone can help
> with you on performance xlators that can improve it a bit more. My area of
> expertise is more on replication. I am sub-maintainer for replication,locks
> components. I also know connection management/io-threads related issues
> which lead to hangs as I worked on them before. Performance xlators are
> black box to me.
>
> Performance xlators are enabled only on fuse gluster stack. On nfs server
> mounts we disable all the performance xlators except write-behind as nfs
> client does lots of things for improving performance. I suggest you guys
> follow up more on gluster-devel.
>
> Appreciate all the help you did for improving the product :-). Thanks a
> ton!
> Pranith
>
> Pranith
>
> David (Sent from mobile)
>
> ===============================
> David F. Robinson, Ph.D.
> President - Corvid Technologies
> 704.799.6944 x101 [office]
> 704.252.1310 [cell]
> 704.799.7974 [fax]
> David.Robinson at corvidtec.com
> http://www.corvidtechnologies.com
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140806/53b087db/attachment-0001.html>