[Gluster-users] 1/4 glusterfsd's runs amok; performance suffers;

Sat Aug 11 15:31:51 UTC 2012

Thanks for your comments.

I use mdadm on many servers and I've seen md numbering like this a fair
bit. Usually it occurs after a another RAID has been created and the
numbering shifts.  Neil Brown (mdadm's author) , seems to think it's fine.
 So I don't think that's the problem.  And you're right - this is a
Frankengluster made from a variety of chassis and controllers and normally
it's fine.   As Brian noted, it's all the same to gluster, mod some small
local differences in IO performance.

Re the size difference, I'll explicitly rebalance the brick after the
fix-layout finishes, but I'm even more worried about this fantastic
increase in CPU usage and its effect on user performance.

In the fix-layout routines (still running), I've seen CPU usage of
glusterfsd rise to ~400% and loadavg go up to >15 on all the servers
(except the pbs3, the one that originally had that problem).  That high
load does not last long tho (maybe a few mintes - we've just installed
nagios on these nodes and I'm getting a ton of emails about load increasing
and then decreasing on all the nodes (except pbs3).  When the load goes
very high on a server node, the user-end performance drops appreciably.

hjm

On Sat, Aug 11, 2012 at 4:20 AM, Brian Candler <B.Candler at pobox.com> wrote:

> On Sat, Aug 11, 2012 at 12:11:39PM +0100, Nux! wrote:
> > On 10.08.2012 22:16, Harry Mangalam wrote:
> > >pbs3:/dev/md127  8.2T  5.9T  2.3T  73% /bducgl  <---
> >
> > Harry,
> >
> > The name of that md device (127) indicated there may be something
> > dodgy going on there. A device shouldn't be named 127 unless some
> > problems occured. Are you sure your drives are OK?
>
> I have systems with /dev/md127 all the time, and there's no problem. It
> seems to number downwards from /dev/md127 - if I create md array on the
> same
> system it is /dev/md126.
>
> However, this does suggest that the nodes are not configured identically:
> two are /dev/sda or /dev/sdb, which suggests either plain disk or hardware
> RAID, while two are /dev/md0 or /dev/127, which is software RAID.
>
> Although this could explain performance differences between the nodes, this
> is transparent to gluster and doesn't explain why the files are unevenly
> balanced - unless there is one huge file which happens to have been
> allocated to this node.
>
> Regards,
>
> Brian.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120811/7d140f07/attachment.html>