[Gluster-users] Gluster-users Digest, Vol 49, Issue 25 -- Disk utilization

Mon May 21 13:22:03 UTC 2012

Peter,

see comments marked with ben> below, hope this helps.

Message: 1
Date: Tue, 15 May 2012 22:12:10 +0200
From: Peter Frey <pfrey09 at googlemail.com>
Subject: [Gluster-users] Disk utilisation
To: gluster-users at gluster.org
Message-ID:
	<CAFWmEw==E990t-DYa_DRB37w3dDrkNLJJ=qFGJt3-bptmtGamQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

we are using Gluster to make http file downloads available. We currently
have 2 gluster servers serving a replicated volume. Each gluster server has
22 disks in a hardware raid, the underlying file system is XFS. The average
file size is around 3-4MB. There are stored around 16TB of data on the
volume.

ben> Linux distro version and Gluster version would be helpful.  What RAID stripe element size?  If you have 64-KB stripe element size, then EVERY disk will be made busy by reading a single 4-MB file.  Striping will not help you much at that file size.  ~130 mbit/s = ~15 MB/s, most disks can read at > 50 MB/s, so your total system throughput is far less than throughput of a single disk drive, so why use striping?  Wouldn't it be better to be able to serve many files in parallel from your disks?  You may want to increase readahead if the application tends to sequentially read the entire file, try increasing it way up, the Linux default of 128 KB is not good for Gluster.   Lastly, try the deadline I/O scheduler on your data disks, CFQ can't help with a Gluster server.

Once we start sending live http traffic towards the infrastructure we see a
horrible performance. For instance if the outgoing bandwidth on each of the
gluster servers is at ~130mbit/s our hardware raid has a busy rate of ~30%.
Once we increase the traffic towards 250mbit/s the busy rate doubles to
60%. With this the iowait values also increase.

We started to play with the read buffers on the http servers. There is no
difference between loading the whole file into memory at once and loading
the file in 64k chunks. This makes me believe that the gluster server loads
the file with its own buffers and the clients buffer has no influence. We
have also enabled profiling on the gluster volume: There are roughly 18
read() calls for each open() call which should be an indication for too
small buffers.

ben> Gluster avoids read caching on the client side.  You can give Gluster servers more memory so that XFS can cache more files if this leads to more cache hits.  If you really need aggressive client-side caching, you can NFS mount the gluster server.  If your app is HTTP-based and is RESTful then there are web caching servers that can intercept requests before they reach your application.   18 read calls/open is not a terrible ratio.  In my experience, if network tuning is correct and read files are cached (or prefetched) on the server, Gluster reads at network speed (which is why disk read-ahead is important).  How much traffic can your network transmit?   Have you tested network by itself (i.e. without using Gluster to test it?)

We have also made the mistake to store all files in a single directory but
XFS advertises that it can handle millions of files in a single directory
so it shouldn't be a problem or should it?

ben> Never put millions of files in a single directory if you can help it.  Many file systems do not do well with this many files/directory.  But even if the filesystem is perfect at it, applications that attempt to display directory contents (other than "find") tend to lock up because apps will read entire directory, read all inodes in directory, sort them, then display them.  Classic example: "ls" command.

ben> Recent XFS versions (such as version in RHEL6.2) handles metadata far better than before (e.g. RHEL6.1), so you may want to make sure you're using the right one.