[Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

Sat Aug 11 16:46:55 UTC 2012

On Sat, Aug 11, 2012 at 9:41 AM, Brian Candler <B.Candler at pobox.com> wrote:

> On Sat, Aug 11, 2012 at 08:31:51AM -0700, Harry Mangalam wrote:
> >    Re the size difference, I'll explicitly rebalance the brick after the
> >    fix-layout finishes, but I'm even more worried about this fantastic
> >    increase in CPU usage and its effect on user performance.
>
> This presumably means you were originally running the cluster with fewer
> nodes, and then added some later?
>

No, but the unbalanced current situation suggests that at some point, it
got out of balance.

>
> >    In the fix-layout routines (still running), I've seen CPU usage of
> >    glusterfsd rise to ~400% and loadavg go up to >15 on all the servers
> >    (except the pbs3, the one that originally had that problem).  That
> high
> >    load does not last long tho (maybe a few mintes - we've just installed
> >    nagios on these nodes and I'm getting a ton of emails about load
> >    increasing and then decreasing on all the nodes (except pbs3).  When
> >    the load goes very high on a server node, the user-end performance
> >    drops appreciably.
>
> Maybe worth trying an strace (strace -f -p <pid> 2>strace.out) on the
> glusterfsd process, or whatever it is which is causing the high load,
> during
> such a burst, just for a few seconds. The output might give some clues.
>

Good idea.  I'll watch and when it goes wacko and post the  filtered
results.

Thanks
Harry

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120811/410bbd04/attachment.html>