[Gluster-users] gluster fails under heavy array job load load

Sat Dec 14 00:52:12 UTC 2013

Hi Alex,

Thanks for taking the time to think about this.

I don't have metrics at hand, but I tend to think not for 2 reasons.
- when I have looked at stats from the network, it has never been close to 
saturating - the bottlenecks appear to be most at the gluster server side.
I get emailed if my servers go above a load of 8 (the servers have 8 cores) 
and when that happens, I often get complaints from users that they've had 
incomplete runs.

At these points the network load is often fairly high (1GB/s, aggregate), but 
on a QDR network, that shouldn't be saturating.

-  the same jobs, when run using another distributed FS, on the same IB 
fabric, have no such behavior, which would tend to point the fault at gluster 
or (granted) my configuration of it.

- while a lot of the IO load is large streaming RW, there are a subsection of 
jobs that users insist on using Zillions of Tiny (ZOT) files as output - they 
use the file names for indices or as table row entries.  (One user had >20M 
files in a tree). We're trying to educate them, but it takes time and energy.
Gluster seems to have a lot of trouble traversing these huge file fields, 
moreso than DFSs that use metadata servers.

That said, it has been stable otherwise and there are a lot of things to 
recommend it.

hjm

On Friday, December 13, 2013 02:00:19 PM Alex Chekholko wrote:
> Hi Harry,
> 
> My best guess is that you overloaded your interconnect.  Do you have
> metrics for if/when your network was saturated?  That would cause
> Gluster clients to time out.
> 
> My best guess is that you went into the "E" state of your "USE
> (Utilization, Saturation, Error)" spectrum.
> 
> IME, that is a common pattern for out Lustre/GPFS clients, you get all
> kinds of weird error states if you manage to saturate your I/O for an
> extended period of time and fill all of the buffers everywhere.
> 
> Regards,
> Alex
> 
> On 12/12/2013 05:03 PM, harry mangalam wrote:
> > Short version: Our gluster fs (~340TB) provides scratch space for a
> > ~5000core academic compute cluster.
> > 
> > Much of our load is streaming IO, doing a lot of genomics work, and that
> > is the load under which we saw this latest failure.

---
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131213/b26a5635/attachment.html>