[Gluster-users] glusterfs and glusterfsd process utilization extremely high

Sat Nov 22 18:20:56 UTC 2014

Hi Pranith,

Thank you very much for the quick reply and the information.  I am in the
process now of recreating the cluster using XFS.  This all brings up a few
questions:

- I assume the change from EXT4 to XFS will correct the problem with
readdir (in other words, the issue is not present in XFS)?
- Do you have any idea when the patch for this might be out?  My reason for
asking is that I have another cluster that has been updated to 3.6 and is
running on EXT4 but does not yet have an issue.  This concerns me so I am
hoping the patch will be out soon?
- What exactly does cluster.entry-self-heal do?  I can't seem to find a
description of it?
- I assume from your posts that the reason the cluster is fine until
traffic hits it is because the self-heal is not happening until traffic
causes the files to be read.  Is that how it works?

Thank you again for the fast response and the great product!

----
Kyle

On Sat, Nov 22, 2014 at 11:36 AM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:

>
> On 11/22/2014 11:04 PM, Pranith Kumar Karampuri wrote:
>
>
> On 11/22/2014 10:40 PM, Pranith Kumar Karampuri wrote:
>
>
> On 11/22/2014 10:29 PM, Kyle Harris wrote:
>
>  Hello,
>
>  I have an issue with a 3 node replicated cluster.  My issue started
> after reboot a while back.  The top command would show the glusterfs and
> glusterfsd processes eating up almost all the resources on an all three
> nodes of the cluster.  So much so that it would not run the web sites that
> are hosted on it.  The httpd processes would begin to hang.  I finally
> decided to tear down the cluster and rebuild it from the ground up.  I did
> so and then copied all the data back which took all night due to the amount
> of data.  All was well during that entire copy process back to the cluster
> with no resource spikes.
>
>  Assuming you go back to 3.5.2
> Execute the following commands:
> # gluster volume set <volname> cluster.entry-self-heal off
>
> This should prevent httpd hangs.
>
> If you still find that the CPU usage is very high, execute the following
> command:
> # gluster volume set <volname> cluster.self-heal-daemon off
>
> This disables self-healing. But you should probably periodically heal so
> that the data is healed by enabling self-heal-daemon using following
> command:
> # gluster volume set <volname> cluster.self-heal-daemon on
>
> Once "gluster volume heal <volname> info" shows zero entries, then healing
> is complete.
>
> We took some steps to improve this in 3.6. But readdir in EXT4 is not
> working correctly so that is probably giving problems here. Lets wait for
> Vijay to merge the patch I mentioned, then things should be fine.
>
> Sorry for the inconvenience caused. We found the issue after the release
> is made :-(.
>
> Pranith
>
>
> Pranith
>
>
>  I should note that this cluster is home to many Apache/PHP based web
> sites.  The problem starts again, however the minute I point traffic back
> to the sites on the cluster.  Before pointing traffic to it, all is fine
> but as soon as the traffic begins to hit it, the utilization again begins
> to spike.  Note that all the sites run just fine when hosted from a
> standard EXT4 partition.  I noticed another thread labeled "glusterfsd
> process thrashing CPU" where Pranith asks if the user has directories with
> lots of files and I do.
>
>  Here are some other details of my cluster:
> - OS:  CentOS 6.6 with all updates on all 3 nodes as of 11-22-2014
> - All 3 nodes have 8 cores with 16 GB of RAM
> - Nodes are all formatted with EXT4
> - All three nodes also have the files systems mounted on them for use with
> Apache.  I have experimented with both NFS and Fuse mounts and it doesn't
> seem to make a difference which I use for this particular problem.  I am
> currently using Fuse.
> - Approximately 135 GB of data.  Some deep directories with many small
> files.
> - No optimization or changes have been made to the cluster . . . it is
> running with default options
> - Gluster version 3.6.1-1 installed from RPMs
> - Note the issue originally occurred on version 3.5.2 but I updated before
> rebuilding it in hopes that would fix it (it didn't)
>
>  Can anyone give me guidance on how to tackle this problem?  I am hoping
> perhaps Pranith can give some details as to why the question about many
> files and how to proceed given my situation.  I know others have commented
> about having many small files with regard to performance but when the
> processors are not spiked, performance has been acceptable.  Any help would
> be greatly appreciated.
>
>   Kyle,
>       3.6.1 and EXT4 has a problem because of 64 bits offset. Afr-v2
> implementation introduced this problem. We thought the following patch is
> merged but it didn't :-( http://review.gluster.com/8201. Please don't use
> 3.6.1 with EXT4
>
> Vijay,
>       Please merge http://review.gluster.com/8201
>
> Pranith
>
> --
>  Kyle
>
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>

-- 
Kyle A. Harris
Kyle at TheHarrisHome.com
615-364-6752
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141122/e37f74a8/attachment.html>