[Gluster-devel] Re: [Gluster-users] I/O fair share to avoid I/O bottlenecks on small clsuters
gordan at bobich.net
Mon Feb 1 15:14:05 UTC 2010
Jeff Darcy wrote:
> On 01/31/2010 09:06 AM, Ran wrote:
>> You guys are talking about network IO im taking about the gluster server disk IO
>> the idea to shape the trafic does make sence seens the virt machines
>> server do use network to get to the disks(gluster)
>> but what about if there are say 5 KVM servers(with VPS's) all on
>> gluster what do you do then ? its not quite fair share seens every
>> server has its own fair share and doesnt see the others .
>> Also there are other applications that uses gluster like mail etc..
>> and i see that gluster IO is very high very often cousing the all
>> storage not to work .
>> Its very disturbing .
> You bring up a good set of points. Some of these problems can be
> addressed at the hypervisor (i.e. GlusterFS client) level, some can be
> addressed by GlusterFS itself, and some can be addressed only at the
> level of the local-filesystem or block-device level on the GlusterFS
That sentence doesn't really parse for me. A part of the problem is that
Ran didn't really specify what his storage setup is (DAS in the host or
SAN), and whether the "uses up all disk I/O" is referring to it using up
all the available disk I/O on just the local virtualization host (DAS)
or whether the access pattern from one server is eating all the disk I/O
for all the other servers connected to the SAN. Obviously, one is more
pathological than the other, but without knowing the details it is
impossible to point the finger at gluster when the problem could be more
deeply rooted (e.g. a mis-optimization of the RAID array). Optimizing
file systems is a relatively complex thing and a lot of the conventional
wisdom is just plain wrong at times.
Here's an article I wrote on the subject a while back:
I'm not sure how much of this is applicable to the specific case being
discussed but I cannot help but wonder just how many (if any at all)
"enterprise grade" storage solutions take all of what is mentioned there
into account. In my experience the difference in I/O throughput can be
quite staggering, especially for random I/O.
> Unfortunately, I/O traffic shaping is still in its infancy
> compared to what's available for networking - or perhaps even "infancy"
> is too generous. As far as the I/O stack is concerned, all of the
> traffic is coming from the glusterfsd process(es) without
> differentiation, so even if the functionality to apportion I/O amongst
> tasks existed it wouldn't be usable without more information. Maybe
> some day...
I don't think this would even be useful. It sounds like seeking more
finely grained (sub-process level!) control over disk I/O prioritisation
without there even being a clearly presented case about the current
functionality (ionice) not being sufficient.
If you are running a glfs server in a guest VM, and that VM is consuming
all of the disk I/O available to the host, then the guest VM container
process (qemu for qemu or KVM, vmx for vmware, etc.) can be ionice-d to
lower it's priority and give the other VMs more share of the disk I/O. I
haven't heard an argument yet explaining why that is not sufficient in
> What you can do now at the GlusterFS level, though, is make sure that
> traffic is distributed across many servers and possibly across many
> volumes per server to take advantage of multiple physical disks and/or
> interconnects for one server. That way, a single VM will only use a
> small subset of the servers/volumes and will not starve other clients
> that are using different servers/volumes (except for network bottlenecks
> which are a separate issue). That's what the "distribute" translator is
> for, and it can be combined with replicate or stripe to provide those
> functions as well. Perhaps it would be useful to create and publish
> some up-to-date recipes for these sorts of combinations.
Hold on, you seem to be talking about something else here. You're
talking about clients not distributing their requests evenly across
servers. Is that really what the original problem was about? My
understanding or the original post was that a glfs server VM (KVM) was
consuming more than it's fair share of disk I/O capability, and that
there was a need to throttle it - which can be done by applying ionice
to the qemu container process.
Given that this has been pretty much ignored, I'm guessing that I'm
missing the point and that my understanding of the problem being
experienced is in some way incorrect. So can we have some clarification
on it, with the explanation of why ionice-ing the qemu process isn't
applicable? What other feature is required and why exactly would it be
More information about the Gluster-devel