[Gluster-users] high CPU load on all bricks

Sat Feb 16 18:02:21 UTC 2013

            I ran a series of iozone tests on three configurations: 1)
reading / writing directly to the local disk on a brick system, 2) reading /
writing over the glusterfs mount on a brick system, and 3) reading / writing
over an independent NFS mount between two of the brick systems. Here's the
data for test file sizes of 10 MB, 100 MB, 1,000 MB and 2,000 MB
respectively:

Writing (GB/s):

Local Drives                 GlusterFS                     NFS client

0.677799225                  0.028595924                  0.793493271

1.122471809                  0.028735161                  1.176805496

1.209827423                  0.041208267                  1.559321404

1.152087212                  0.043795586                  1.448302269

Reading (GB/s):

Local Drives                 GlusterFS                     NFS client

3.12509346                   0.184482574                  2.989168167

4.549301147                  0.191487312                  3.524992943

5.532466888                  0.194747925                  5.034347534

5.111581802                  0.199500084                  4.540629387

            The NFS results (some of which are faster than local i/o due
most likely to caching in memory) and local drive results clearly rule out
the drives and the network (QDR Infiniband) as bottlenecks. It's also clear
that something is going very awry on the glusterfs side of things. I'm not
sure what more I can do in terms of testing and analysis; is there any fix
or will I need to abandon this gluster deployment and chalk it up to an
unknown error?

            Thanks,

            ~Mike C. 

From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Michael Colonno
Sent: Wednesday, February 13, 2013 10:35 PM
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] high CPU load on all bricks

            More data: I got the Infiniband network (QDR) working well and
switched my gluster volume to the Infiniband fabric (IPoIB, not RDMA since
it doesn't seem to be supported yet for 3.x). The filesystem was slightly
faster but still well short of what I would expect by a wide margin. Via an
informal test (timing the movement of a large file) I'm getting several MB/s
- well short of even a standard Gb network copy. With the faster network the
CPU load on the brick systems increased dramatically: now I'm seeing
200%-250% usage by glusterfsd and glusterfs. 

            This leads me to believe that gluster is really not enjoying my
eight-brick, 2x replication volume with each brick system also being a
client. I tried a rebalance but no measurable effect. Any suggestions for
improving the performance? Having each brick be a client of itself seemed
the most logical choice to remove interdependencies but now I'm doubting the
setup.

            Thanks,

            ~Mike C. 

From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Joe Julian
Sent: Sunday, February 03, 2013 11:47 AM
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] high CPU load on all bricks

On 02/03/2013 11:22 AM, Michael Colonno wrote:

            Having taken a lot more data it does seem the glusterfsd and
glusterd processes (along with several ksoftirqd) spike up to near 100% on
both client and brick servers during any file transport across the mount.
Thankfully this is short-lived for the most part but I'm wondering if this
is expected behavior or what others have experienced(?) I'm a little
surprised such a large CPU load would be required to move small files and /
or use an application within a Gluster mount point. 

If you're getting ksoftirqd spikes, that sounds like a hardware issue to me.
I never see huge spikes like that on my servers nor clients.

            I wanted to test this against an NFS mount of the same Gluster
volume. I managed to get rstatd installed and running but my attempts to
mount the volume via NFS are met with: 

            mount.nfs: requested NFS version or transport protocol is not
supported

            Relevant line in /etc/fstab:

            node1:/volume    /volume    nfs
defaults,_netdev,vers=3,mountproto=tcp        0 0      

It looks like CentOS 6.x has NFS version 4 built into everything. So a few
questions:

-       Has anyone else noted significant performance differences between a
glusterfs mount and NFS mount for volumes of 8+ bricks? 

-       Is there a straightforward way to make the newer versions of CentOS
play nice with NFS version 3 + Gluster? 

-       Are there any general performance tuning guidelines I can follow to
improve CPU performance? I found a few references to the cache settings but
nothing solid. 

If the consensus is that NFS will not gain anything then I won't waste the
time setting it all up. 

NFS gains you the use of FSCache to cache directories and file stats making
directory listings faster, but it adds overhead decreasing the overall
throughput (from all the reports I've seen).

I would suspect that you have the kernel nfs server running on your servers.
Make sure it's disabled.

Thanks,

~Mike C. 

From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Michael Colonno
Sent: Friday, February 01, 2013 4:46 PM
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] high CPU load on all bricks

            Update: after a few hours the CPU usage seems to have dropped
down to a small value. I did not change anything with respect to the
configuration or unmount / stop anything as I wanted to see if this would
persist for a long period of time. Both the client and the self-mounted
bricks are now showing CPU < 1% (as reported by top). Prior to the larger
CPU loads I installed a bunch of software into the volume (~ 5 GB total). Is
this kind a transient behavior - by which I mean larger CPU loads after a
lot of filesystem activity in short time - typical? This is not a problem in
my deployment; I just want to know what to expect in the future and to
complete this thread for future users. If this is expected behavior we can
wrap up this thread. If not then I'll do more digging into the logs on the
client and brick sides. 

            Thanks,

            ~Mike C. 

From: Joe Julian [mailto:joe at julianfamily.org] 
Sent: Friday, February 01, 2013 2:08 PM
To: Michael Colonno; gluster-users at gluster.org
Subject: Re: [Gluster-users] high CPU load on all bricks

Check the client log(s). 

Michael Colonno <mcolonno at stanford.edu> wrote:

            Forgot to mention: on a client system (not a brick) the
glusterfs process is consuming ~ 68% CPU continuously. This is a much less
powerful desktop system so the CPU load can't be compared 1:1 with the
systems comprising the bricks but still very high. So the issue seems to
exist with both glusterfsd and glusterfs processes. 

            Thanks,

            ~Mike C. 

From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Michael Colonno
Sent: Friday, February 01, 2013 12:46 PM
To: gluster-users at gluster.org
Subject: [Gluster-users] high CPU load on all bricks

            Gluster gurus ~

            I've deployed and 8-brick (2x replicate) Gluster 3.3.1 volume on
CentOS 6.3 with tcp transport. I was able to build, start, mount, and use
the volume. On each system contributing a brick, however, my CPU usage
(glusterfsd) is hovering around 20% (virtually zero memory usage
thankfully). These are brand new, fairly beefy servers so 20% CPU load is
quite a bit. The deployment is pretty plain with each brick mounting the
volume to itself via a glusterfs mount. I assume this type of CPU usage is
atypically high; is there anything I can do to investigate what's soaking up
CPU and minimize it? Total usable volume size is only about 22 TB (about 45
TB total with 2x replicate). 

            Thanks,

            ~Mike C. 

  _____  

Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130216/3bbd73e5/attachment.html>