[Gluster-devel] Re: [Gluster-users] I/O fair share to avoid I/O bottlenecks on small clsuters

Gordan Bobic gordan at bobich.net
Tue Feb 2 16:32:23 UTC 2010

Ran wrote:
>> The line that you need to add is the one with "writeback" in it.
>> If you are running qemu-kvm manually, you'll need to add the "cache=writeback" > to your list of -drive option parameters.
>> All of this, of course, doesn't preclude applying ionice to the qemu container processes.
> ionice has no affect on network mounts just local disks
> so basicly its useless to ionice the kvm proccess wich takes its IO
> from gluster rather then local disk .

So you are saying that the client gets bogged down by the glfs server(s) 
being slow?

My original understanding was that you were saying that the glfs server 
was consuming all of the disk I/O, which is clearly an entirely 
different issue.

> I agree with gorden , the sulotion in this particular cases need to be
> IO improvments on the OS(gluster servers) level and on gluster
> application level .

Are you using caching performance translators?

> The setup is as follow :
> 2 gluster servers wich DRBD to 1 another so basicly there is only 1
> active storage server with 4 disks --> 2 for os wich is md raid 1 and
> 2 for storage wich is also md raid 1 but over drbd replication as well

Which sync protocol are you using for DRBD? What fs are you using for 
backing glusterfs? Are you mounting with noatime?

> The applicatons that uses this storage are :
> 1) mail storage - nfs over gluster (about 3000 mail accounts)

Ouch. Just - ouch. What version of gluster are you using, what kernel, 
and are you using knfsd with the patched fuse module from 
fuse-2.7.4glfs11 or unfsd?

In general, my experience of performane of unfsd over gluster has bee 
terrible, but knfsd isn't that much better, either (even if we ignore 
the buggyness off that setup - I could never get it to work sufficiently 
well to use it).

> 2) statistic logs of web servers - samba over gluster(some NT servers uses this)
> 3) kvm images - for now im testing with only 1 KVM host and 3 virtual
> win2k servers)

Hmm... If your host and glfs servers are UPS backed, make sure you 
enable write-caching on qemu. Note: libvirt doesn't yet support this, 
even though qemu-kvm that ships with RHEL/CentOS 5.4 does. Weirdly, 
virt-install supports the caching parameters for creating the qemu xml 
configs, but starting up the VM, you'll find that cache=writeback isn't 
there on the running qemu command line. This means you'll have to start 
up your VMs manually with the suitable qemu command line. But 
performance difference is worth it. Also make sure you are using the 
deadline I/O scheduler for the KVM host.

> what happen is that if say 1 KVM virtual win2k (not host) run for
> example high IO test
> with stress IO tool(inside the win2k virt) , the entire gluster
> storage crowl to not functioning
> including mail etc... the gluster server load is at 3 to 4
> the storage barily function , this is with only 1 virtual machine(win2k kvm)
> so im just wondering what will happen with say 10 virtual machines
> nothing will work .

Right, I follow now what you're saying.

> I agree that the md raid is affecting the all thing but i didnt think
> it will be crucial .

I don't think MD RAID1 has anything to do with your problems here. The 
problem you are seeing is ultimately no different to any SAN/NAS. If one 
client's I/O overwhelms the capabilities of your SAN/NAS, then yes, all 
the other clients will grind to a halt with it. But if you are aware of 
any other SAN or NAS solutions that have resource throttling features 
along the lines of what would be required to address this, I'd rather 
like to hear about it - I've not seen anything with such functionality 
in the wild.

In the meantime, you may find that aggressive caching on all levels 
(write caching in the guest, writeback on the qemu hypervisor, cache and 
write-behind on glfs) will help quite a lot. The downside is that fsync 
guarantees will go out the window, so you'll have to make sure that the 
machines in question are all UPS backed if in a live environment.


More information about the Gluster-devel mailing list