[Gluster-devel] Re: [Gluster-users] I/O fair share to avoid I/O bottlenecks on small clsuters

Gordan Bobic gordan at bobich.net
Sun Jan 31 19:04:36 UTC 2010

Ran wrote:
> You guys are talking about network IO im taking about the gluster server disk IO

I also explained how you could use ionice to throttle the entire VM's 
disk I/O.

> the idea to shape the trafic does make sence seens the virt machines
> server do use network to get to the disks(gluster)
> but what about if there are say 5 KVM servers(with VPS's) all on
> gluster what do you do then ?

I'm not sure what setup you are describing here, but the approaches I 
mentioned (both network throttling using tc/iptables and disk throttling 
using ionice on the qemu container) should both work just the same 
regardless whether the VMs are on the same or different hosts (or 
entirely separate bare metal hardware).

> its not quite fair share seens every
> server has its own fair share and doesnt see the others .

If there are 5 VMs going flat out on disk I/O, they'll all get their 
"fair share", but your host will still grind to a halt. Even if you 
ionice -c3 the VMs in question, it'll still grind to a crawl because 
when you start hitting the disk that hard (and you are falling out of 
caches), it doesn't matter how you prioritise traffic, the random seeks 
are going to kill your performance. Same reason why even if you ionice 
and throttle your software RAID sync daemon, the performance of the 
server will still suck until the resync is complete. If you haven't got 
the throughput to handle the load you are seeing, no amount of traffic 
shaping and throttling is going to change that.

> Also there are other applications that uses gluster like mail etc..

What has that got to do with anything with server throttling? If you 
limit the server disk I/O, the clients accessing the glfs shares will go 
even slower (spend more time in IO wait state).

> and i see that gluster IO is very high very often cousing the all
> storage not to work .
> Its very disturbing .

You still haven't explained what hardware you are using, what glfs 
configuration you are using (replicate? distribute?), how much memory 
you give the VMs (and how much there is on the host), etc.

A couple of other things to consider:

1) Virtualization _sucks_ for disk I/O. I was doing some tests recently 
with KVM, VMware, Xen (fully and para virtualized) and VirtualBox. 
VMware won the test, and that suffered "only" a 40% performance 
degradation compared to bare metal (tested all on the same machine), 
with identical resources (I limited the bare metal box's memory with 
mem= kernel parameter to make sure there is no unfair benefit from the 
extra RAM in the host). Fully paravirtualized Xen and KVM came close 
second, followed a fair way away with fully virtualized Xen, and finally 
VirtalBox rock bottom last at about 20x slower.

Thing to consider here is that the BEST performing virtualization system 
provided only 60% (!) of performance of bare metal on disk I/O, and that 
is after the benefits of the extra caching being done by the host! If 
your processes are disk I/O bound, prepare to be devastatingly 
disappointed with performance of ANY virtualized solution.

2) If you are using KVM, you have plenty of RAM on the host in excess of 
what the guests are using up, and your host is UPS backed, you can get 
vast performance improvements by enabling write-back caching on your 
qemu VMs.


If you are setting up your VMs using virt-install:

If you are using virt-manager, you'll have to mod your xml config files 
in /etc/libvirt/qemu to add the parameter there:

<disk type='file' device='disk'>
   <driver name='qemu' cache='writeback'/>
   <source file='/var/lib/libvirt/images/foo.img'/>
   <target dev='hda' bus='ide'/>

The line that you need to add is the one with "writeback" in it.

If you are running qemu-kvm manually, you'll need to add the 
"cache=writeback" to your list of -drive option parameters.

All of this, of course, doesn't preclude applying ionice to the qemu 
container processes.


More information about the Gluster-devel mailing list