[Gluster-users] high CPU load on all bricks

Thu Feb 14 18:13:42 UTC 2013

Tcp transport and file sizes are nominal (up to a few GB typically). Using glusterfs mount (no NFS). There's nothing unusual about the deployment except the eight-brick setup server-client setup mentioned below. Is there anything I can do to identify the bottleneck(s) and / or tune performance? I'm going to try to build the rpms myself though I doubt that will change anything vs. the pre-built ones.

Thanks,
Mike C.

On Feb 14, 2013, at 9:52 AM, Bryan Whitehead <driver at megahappy.net> wrote:

> is transport tcp or tcp,rdma? I'm using transport=tcp for IPoIB and get pretty fantastic speeds. I noticed when I used tcp,rdma as my transport I had problems.
> 
> Are you mounting via fuse or nfs? I don't have any experience using the nfs but fuse works really well.
> 
> Additionally, how are you using the volume? many small files or big large files? I'm hosting qcow2 files that are between 4 and 250GB.
> 
> 
> On Wed, Feb 13, 2013 at 10:35 PM, Michael Colonno <mcolonno at stanford.edu> wrote:
>>             More data: I got the Infiniband network (QDR) working well and switched my gluster volume to the Infiniband fabric (IPoIB, not RDMA since it doesn’t seem to be supported yet for 3.x). The filesystem was slightly faster but still well short of what I would expect by a wide margin. Via an informal test (timing the movement of a large file) I’m getting several MB/s – well short of even a standard Gb network copy. With the faster network the CPU load on the brick systems increased dramatically: now I’m seeing 200%-250% usage by glusterfsd and glusterfs.
>> 
>>             This leads me to believe that gluster is really not enjoying my eight-brick, 2x replication volume with each brick system also being a client. I tried a rebalance but no measurable effect. Any suggestions for improving the performance? Having each brick be a client of itself seemed the most logical choice to remove interdependencies but now I’m doubting the setup…
>> 
>>  
>> 
>>             Thanks,
>> 
>>             ~Mike C.
>> 
>>  
>> 
>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Joe Julian
>> 
>> 
>> Sent: Sunday, February 03, 2013 11:47 AM
>> To: gluster-users at gluster.org
>> Subject: Re: [Gluster-users] high CPU load on all bricks
>>  
>> 
>> On 02/03/2013 11:22 AM, Michael Colonno wrote:
>> 
>> 
>>             Having taken a lot more data it does seem the glusterfsd and glusterd processes (along with several ksoftirqd) spike up to near 100% on both client and brick servers during any file transport across the mount. Thankfully this is short-lived for the most part but I’m wondering if this is expected behavior or what others have experienced(?) I’m a little surprised such a large CPU load would be required to move small files and / or use an application within a Gluster mount point.
>> 
>> 
>> If you're getting ksoftirqd spikes, that sounds like a hardware issue to me. I never see huge spikes like that on my servers nor clients.
>> 
>> 
>> 
>>  
>> 
>>             I wanted to test this against an NFS mount of the same Gluster volume. I managed to get rstatd installed and running but my attempts to mount the volume via NFS are met with:
>> 
>>  
>> 
>>             mount.nfs: requested NFS version or transport protocol is not supported
>> 
>>  
>> 
>>             Relevant line in /etc/fstab:
>> 
>>  
>> 
>>             node1:/volume    /volume    nfs     defaults,_netdev,vers=3,mountproto=tcp        0 0     
>> 
>>  
>> 
>> It looks like CentOS 6.x has NFS version 4 built into everything. So a few questions:
>> 
>>  
>> 
>> -       Has anyone else noted significant performance differences between a glusterfs mount and NFS mount for volumes of 8+ bricks?
>> 
>> -       Is there a straightforward way to make the newer versions of CentOS play nice with NFS version 3 + Gluster? 
>> 
>> -       Are there any general performance tuning guidelines I can follow to improve CPU performance? I found a few references to the cache settings but nothing solid.
>> 
>>  
>> 
>> If the consensus is that NFS will not gain anything then I won’t waste the time setting it all up.
>> 
>> 
>> NFS gains you the use of FSCache to cache directories and file stats making directory listings faster, but it adds overhead decreasing the overall throughput (from all the reports I've seen).
>> 
>> I would suspect that you have the kernel nfs server running on your servers. Make sure it's disabled.
>> 
>> 
>> 
>>  
>> 
>> Thanks,
>> 
>> ~Mike C.
>> 
>>  
>> 
>>  
>> 
>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Michael Colonno
>> Sent: Friday, February 01, 2013 4:46 PM
>> To: gluster-users at gluster.org
>> Subject: Re: [Gluster-users] high CPU load on all bricks
>> 
>>  
>> 
>>             Update: after a few hours the CPU usage seems to have dropped down to a small value. I did not change anything with respect to the configuration or unmount / stop anything as I wanted to see if this would persist for a long period of time. Both the client and the self-mounted bricks are now showing CPU < 1% (as reported by top). Prior to the larger CPU loads I installed a bunch of software into the volume (~ 5 GB total). Is this kind a transient behavior – by which I mean larger CPU loads after a lot of filesystem activity in short time – typical? This is not a problem in my deployment; I just want to know what to expect in the future and to complete this thread for future users. If this is expected behavior we can wrap up this thread. If not then I’ll do more digging into the logs on the client and brick sides.
>> 
>>  
>> 
>>             Thanks,
>> 
>>             ~Mike C.
>> 
>>  
>> 
>> From: Joe Julian [mailto:joe at julianfamily.org] 
>> Sent: Friday, February 01, 2013 2:08 PM
>> To: Michael Colonno; gluster-users at gluster.org
>> Subject: Re: [Gluster-users] high CPU load on all bricks
>> 
>>  
>> 
>> Check the client log(s).
>> 
>> Michael Colonno <mcolonno at stanford.edu> wrote:
>> 
>>             Forgot to mention: on a client system (not a brick) the glusterfs process is consuming ~ 68% CPU continuously. This is a much less powerful desktop system so the CPU load can’t be compared 1:1 with the systems comprising the bricks but still very high. So the issue seems to exist with both glusterfsd and glusterfs processes.
>> 
>>  
>> 
>>             Thanks,
>> 
>>             ~Mike C.
>> 
>>  
>> 
>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Michael Colonno
>> Sent: Friday, February 01, 2013 12:46 PM
>> To: gluster-users at gluster.org
>> Subject: [Gluster-users] high CPU load on all bricks
>> 
>>  
>> 
>>             Gluster gurus ~
>> 
>>  
>> 
>>             I’ve deployed and 8-brick (2x replicate) Gluster 3.3.1 volume on CentOS 6.3 with tcp transport. I was able to build, start, mount, and use the volume. On each system contributing a brick, however, my CPU usage (glusterfsd) is hovering around 20% (virtually zero memory usage thankfully). These are brand new, fairly beefy servers so 20% CPU load is quite a bit. The deployment is pretty plain with each brick mounting the volume to itself via a glusterfs mount. I assume this type of CPU usage is atypically high; is there anything I can do to investigate what’s soaking up CPU and minimize it? Total usable volume size is only about 22 TB (about 45 TB total with 2x replicate). 
>> 
>>  
>> 
>>             Thanks,
>> 
>>             ~Mike C.
>> 
>>  
>> 
>>  
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>  
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130214/4617595e/attachment.html>