[Gluster-devel] Throttling xlator on the bricks

Thu Jan 28 13:56:35 UTC 2016

> TBF isn't complicated at all - it's widely used for traffic shaping, cgroups,
> UML to rate limit disk I/O.

It's not complicated and it's widely used, but that doesn't mean it's
the right fit for our needs.  Token buckets are good to create a
*ceiling* on resource utilization, but what if you want to set a floor
or allocate fair shares instead?  Even if what you want is a ceiling,
there's a problem of how many tokens should be entering the system.
Ideally that number should match the actual number of operations the
resource can handle per time quantum, but for networks and disks that
number can be pretty variable.  That's why network QoS is a poorly
solved problem and disk QoS is even worse.

To create a floor using token buckets, you have to chain buckets
together.  Each user/activity draws first from its own bucket, setting
the floor.  When that bucket is exhausted, it starts drawing from the
next bucket, eventually from an infinite "best effort" bucket at the end
of the chain.  To allocate fair shares (which is probably closest to
what we want in this case) you need active monitoring of how much work
the resource is actually doing.  As that number fluctuates, so does the
number of tokens, which are then divided *proportionally* between
buckets.  Hybrid approaches - e.g. low and high watermarks,
bucket-filling priorities - are also possible.

Then we get to the problem of how to distribute a resource fairly
*across nodes* when the limits are actually being applied locally on
each.  This is very similar to the problem we faced with quota over DHT,
and the same kind of approaches (e.g. a "balancing daemon") might apply.