[Gluster-devel] Throttling xlator on the bricks

Joe Julian joe at julianfamily.org
Tue Jan 26 05:09:33 UTC 2016



On 01/25/16 20:36, Pranith Kumar Karampuri wrote:
>
>
> On 01/26/2016 08:41 AM, Richard Wareing wrote:
>>> If there is one bucket per client and one thread per bucket, it 
>>> would be
>>> difficult to scale as the number of clients increase. How can we do 
>>> this
>>> better?
>> On this note... consider that 10's of thousands of clients are not 
>> unrealistic in production :).  Using a thread per bucket would also 
>> be....unwise..
>
> There is only one thread and this solution is for internal 
> processes(shd, rebalance, quota etc) not coming in the way of clients 
> which do I/O.
>
>>
>> On the idea in general, I'm just wondering if there's specific 
>> (real-world) cases where this has even been an issue where least-prio 
>> queuing hasn't been able to handle?  Or is this more of a theoretical 
>> concern?  I ask as I've not really encountered situations where I 
>> wished I could give more FOPs to SHD vs rebalance and such.
>
> I have seen users resort to offline healing of the bricks whenever a 
> brick is replaced, or new brick is added to replication to increase 
> replica count. When entry self-heal happens or big VM image data 
> self-heals which do rchecksums CPU spikes are seen and I/O becomes 
> useless.
> This is the recent thread where a user ran into similar problem (just 
> yesterday) (This is a combination of client-side healing and 
> healing-load):
> http://www.gluster.org/pipermail/gluster-users/2016-January/025051.html
>
> We can find more of such threads if we put some time to dig into the 
> mailing list.
> I personally have seen people even resort to things like, "we let 
> gluster heal over the weekend or in the nights when none of us are 
> working on the volumes" etc.

I get at least weekly complaints of such on the IRC channel. A lot of 
them are in virtual environments (aws).

>
> There are people who complain healing is too slow too. We get both 
> kinds of complaints :-). Your multi-threaded shd patch is going to 
> help here. I somehow feel you guys are in this set of people :-).

+1


>>
>> In any event, it might be worth having Shreyas detail his throttling 
>> feature (that can throttle any directory hierarchy no less) to 
>> illustrate how a simpler design can achieve similar results to these 
>> more complicated (and it follows....bug prone) approaches.
>
> The solution we came up with is about throttling internal I/O. And 
> there are only 4/5 such processes(shd, rebalance, quota, bitd etc). 
> What you are saying above about throttling any directory hierarchy 
> seems a bit different than what we are trying to solve from the looks 
> of it(At least from the small description you gave above :-) ). 
> Shreyas' mail detailing the feature would definitely help us 
> understand what each of us are trying to solve. We want to GA both 
> multi-threaded shd and this feature for 3.8.
>
> Pranith
>>
>> Richard
>>
>> ________________________________________
>> From: gluster-devel-bounces at gluster.org 
>> [gluster-devel-bounces at gluster.org] on behalf of Vijay Bellur 
>> [vbellur at redhat.com]
>> Sent: Monday, January 25, 2016 6:44 PM
>> To: Ravishankar N; Gluster Devel
>> Subject: Re: [Gluster-devel] Throttling xlator on the bricks
>>
>> On 01/25/2016 12:36 AM, Ravishankar N wrote:
>>> Hi,
>>>
>>> We are planning to introduce a throttling xlator on the server (brick)
>>> process to regulate FOPS. The main motivation is to solve complaints 
>>> about
>>> AFR selfheal taking too much of CPU resources. (due to too many fops 
>>> for
>>> entry
>>> self-heal, rchecksums for data self-heal etc.)
>>
>> I am wondering if we can re-use the same xlator for throttling
>> bandwidth, iops etc. in addition to fops. Based on admin configured
>> policies we could provide different upper thresholds to different
>> clients/tenants and this could prove to be an useful feature in
>> multitenant deployments to avoid starvation/noisy neighbor class of
>> problems. Has any thought gone in this direction?
>>
>>> The throttling is achieved using the Token Bucket Filter algorithm
>>> (TBF). TBF
>>> is already used by bitrot's bitd signer (which is a client process) in
>>> gluster to regulate the CPU intensive check-sum calculation. By 
>>> putting the
>>> logic on the brick side, multiple clients- selfheal, bitrot, 
>>> rebalance or
>>> even the mounts themselves can avail the benefits of throttling.
>>>
>>> The TBF algorithm in a nutshell is as follows: There is a bucket which
>>> is filled
>>> at a steady (configurable) rate with tokens. Each FOP will need a fixed
>>> amount
>>> of tokens to be processed. If the bucket has that many tokens, the 
>>> FOP is
>>> allowed and that many tokens are removed from the bucket. If not, 
>>> the FOP is
>>> queued until the bucket is filled.
>>>
>>> The xlator will need to reside above io-threads and can have different
>>> buckets,
>>> one per client. There has to be a communication mechanism between the
>>> client and
>>> the brick (IPC?) to tell what FOPS need to be regulated from it, and 
>>> the
>>> no. of
>>> tokens needed etc. These need to be re configurable via appropriate
>>> mechanisms.
>>> Each bucket will have a token filler thread which will fill the tokens
>>> in it.
>> If there is one bucket per client and one thread per bucket, it would be
>> difficult to scale as the number of clients increase. How can we do this
>> better?
>>
>>> The main thread will enqueue heals in a list in the bucket if there 
>>> aren't
>>> enough tokens. Once the token filler detects some FOPS can be serviced,
>>> it will
>>> send a cond-broadcast to a dequeue thread which will process (stack
>>> wind) all
>>> the FOPS that have the required no. of tokens from all buckets.
>>>
>>> This is just a high level abstraction: requesting feedback on any 
>>> aspect of
>>> this feature. what kind of mechanism is best between the 
>>> client/bricks for
>>> tuning various parameters? What other requirements do you foresee?
>>>
>> I am in favor of having administrator defined policies or templates
>> (collection of policies) being used to provide the tuning parameter per
>> client or a set of clients. We could even have a default template per
>> use case etc. Is there a specific need to have this negotiation between
>> clients and servers?
>>
>> Thanks,
>> Vijay
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ&s=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ&e= 
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel



More information about the Gluster-devel mailing list