[Gluster-users] Gluster-users Digest, Vol 48, Issue 18 - Horrible Gluster Performance

Mon Apr 16 14:58:09 UTC 2012

Philip,

What parts of your system perform well?   Can you give a specific example of your workload (what you are asking system to do)?  If it's a mixture of different workloads that's important too.  What version of Gluster and Linux are you using?  My suggestions would be 

a) to reset all your gluster tuning parameters to their default values unless you are sure that they actually improve performance, and 

b) try to isolate your performance problem to as simple a workload as possible before you try to fix it, and try to determine what workloads DO work well in your configuration.  This will make it easier for others to help.  

c) if latency spikes are the issue, this sounds like it could be related to writes being excessively buffered by Linux kernel and then being flushed all at once, which can block reads.  If so, Use "iostat -kx /dev/sd? 5" or equivalent to observe.  You can throttle back "dirty pages" in kernel and avoid buffering dirty pages for long periods of time to avoid these spikes.  

http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/ provides some suggestions that may be relevant to your problem, my recommendations are in a comment here.  

>Message: 9
>Date: Fri, 13 Apr 2012 11:25:58 +0200
>From: Philip <flips01 at googlemail.com>
>Subject: [Gluster-users] Horrible Gluster Performance
>To: gluster-users at gluster.org
>Message-ID:
	<CAKDbnM7AsprRBgiXH6aHoAra6N9DV=KX69w1CEpfmRGW5738DQ at mail.gmail.com>
>Content-Type: text/plain; charset="iso-8859-1"

>I have a small GlusterFS Cluster providing a replicated volume. Each server
>has 2 SAS disks for the OS and logs and 22 SATA disks for the actual data
>striped together as a RAID10 using MegaRAID SAS 9280-4i4e with this
>configuration: http://pastebin.com/2xj4401J

>Connected to this cluster are a few other servers with the native client
>running nginx to serve files stored on it in the order of 3-10MB.

>Right now a storage server has a outgoing bandwith of 300Mbit/s and the
>busy rate of the raid array is at 30-40%. There are also strange
>side-effects: Sometimes the io-latency skyrockets and there is no access
>possible on the raid for >10 seconds. This happens at 300Mbit/s or
>1000Mbit/s of outgoing bandwidth. The file system used is xfs and it has
>been tuned to match the raid stripe size.

>I've tested all sorts of gluster settings but none seem to have any effect
>because of that I've reset the volume configuration and it is using the
>default one.

>Does anyone have an idea what could be the reason for such a bad
>performance? 22 Disks in a RAID10 should deliver *way* more throughput.