[Gluster-devel] Throttling xlator on the bricks

Tue Jan 26 03:11:50 UTC 2016

> If there is one bucket per client and one thread per bucket, it would be
> difficult to scale as the number of clients increase. How can we do this
> better?

On this note... consider that 10's of thousands of clients are not unrealistic in production :).  Using a thread per bucket would also be....unwise..

On the idea in general, I'm just wondering if there's specific (real-world) cases where this has even been an issue where least-prio queuing hasn't been able to handle?  Or is this more of a theoretical concern?  I ask as I've not really encountered situations where I wished I could give more FOPs to SHD vs rebalance and such.

In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it follows....bug prone) approaches.

Richard

________________________________________
From: gluster-devel-bounces at gluster.org [gluster-devel-bounces at gluster.org] on behalf of Vijay Bellur [vbellur at redhat.com]
Sent: Monday, January 25, 2016 6:44 PM
To: Ravishankar N; Gluster Devel
Subject: Re: [Gluster-devel] Throttling xlator on the bricks

On 01/25/2016 12:36 AM, Ravishankar N wrote:
> Hi,
>
> We are planning to introduce a throttling xlator on the server (brick)
> process to regulate FOPS. The main motivation is to solve complaints about
> AFR selfheal taking too much of CPU resources. (due to too many fops for
> entry
> self-heal, rchecksums for data self-heal etc.)

I am wondering if we can re-use the same xlator for throttling
bandwidth, iops etc. in addition to fops. Based on admin configured
policies we could provide different upper thresholds to different
clients/tenants and this could prove to be an useful feature in
multitenant deployments to avoid starvation/noisy neighbor class of
problems. Has any thought gone in this direction?

>
> The throttling is achieved using the Token Bucket Filter algorithm
> (TBF). TBF
> is already used by bitrot's bitd signer (which is a client process) in
> gluster to regulate the CPU intensive check-sum calculation. By putting the
> logic on the brick side, multiple clients- selfheal, bitrot, rebalance or
> even the mounts themselves can avail the benefits of throttling.
>
> The TBF algorithm in a nutshell is as follows: There is a bucket which
> is filled
> at a steady (configurable) rate with tokens. Each FOP will need a fixed
> amount
> of tokens to be processed. If the bucket has that many tokens, the FOP is
> allowed and that many tokens are removed from the bucket. If not, the FOP is
> queued until the bucket is filled.
>
> The xlator will need to reside above io-threads and can have different
> buckets,
> one per client. There has to be a communication mechanism between the
> client and
> the brick (IPC?) to tell what FOPS need to be regulated from it, and the
> no. of
> tokens needed etc. These need to be re configurable via appropriate
> mechanisms.
> Each bucket will have a token filler thread which will fill the tokens
> in it.

If there is one bucket per client and one thread per bucket, it would be
difficult to scale as the number of clients increase. How can we do this
better?

> The main thread will enqueue heals in a list in the bucket if there aren't
> enough tokens. Once the token filler detects some FOPS can be serviced,
> it will
> send a cond-broadcast to a dequeue thread which will process (stack
> wind) all
> the FOPS that have the required no. of tokens from all buckets.
>
> This is just a high level abstraction: requesting feedback on any aspect of
> this feature. what kind of mechanism is best between the client/bricks for
> tuning various parameters? What other requirements do you foresee?
>

I am in favor of having administrator defined policies or templates
(collection of policies) being used to provide the tuning parameter per
client or a set of clients. We could even have a default template per
use case etc. Is there a specific need to have this negotiation between
clients and servers?

Thanks,
Vijay

_______________________________________________
Gluster-devel mailing list
Gluster-devel at gluster.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ&s=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ&e=