[Gluster-devel] Throttling xlator on the bricks
Pranith Kumar Karampuri
pkarampu at redhat.com
Sat Feb 13 03:36:44 UTC 2016
On 02/13/2016 12:13 AM, Richard Wareing wrote:
> Hey Ravi,
>
> I'll ping Shreyas about this today. There's also a patch we'll need for multi-threaded SHD to fix the least-pri queuing. The PID of the process wasn't tagged correctly via the call frame in my original patch. The patch below fixes this (for 3.6.3), I didn't see multi-threaded self heal on github/master yet so let me know what branch you need this patch on and I can come up with a clean patch.
Hi Richard,
I reviewed the patch and found that the same needs to be done
even for ec. So I am thinking if I can split it out as two different
patches, 1 patch in syncop-utils which builds the functionality of
parallelization. Another patch which uses this in afr, ec. Do you mind
if I give it a go? I can complete it by end of Wednesday.
Pranith
>
> Richard
>
>
> =====
>
>
> diff --git a/xlators/cluster/afr/src/afr-self-heald.c b/xlators/cluster/afr/src/afr-self-heald.c
> index 028010d..b0f6248 100644
> --- a/xlators/cluster/afr/src/afr-self-heald.c
> +++ b/xlators/cluster/afr/src/afr-self-heald.c
> @@ -532,6 +532,9 @@ afr_mt_process_entries_done (int ret, call_frame_t *sync_frame,
> pthread_cond_signal (&mt_data->task_done);
> }
> pthread_mutex_unlock (&mt_data->lock);
> +
> + if (task_ctx->frame)
> + AFR_STACK_DESTROY (task_ctx->frame);
> GF_FREE (task_ctx);
> return 0;
> }
> @@ -787,6 +790,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
> int ret = -1;
> afr_mt_process_entries_task_ctx_t *task_ctx;
> afr_mt_data_t *mt_data;
> + call_frame_t *frame = NULL;
>
> mt_data = &healer->mt_data;
>
> @@ -799,6 +803,8 @@ _afr_mt_create_process_entries_task (xlator_t *this,
> if (!task_ctx)
> goto err;
>
> + task_ctx->frame = afr_frame_create (this);
> +
> INIT_LIST_HEAD (&task_ctx->list);
> task_ctx->readdir_xl = this;
> task_ctx->healer = healer;
> @@ -812,7 +818,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
> // This returns immediately, and afr_mt_process_entries_done will
> // be called when the task is completed e.g. our queue is empty
> ret = synctask_new (this->ctx->env, afr_mt_process_entries_task,
> - afr_mt_process_entries_done, NULL,
> + afr_mt_process_entries_done, task_ctx->frame,
> (void *)task_ctx);
>
> if (!ret) {
> diff --git a/xlators/cluster/afr/src/afr-self-heald.h b/xlators/cluster/afr/src/afr-self-heald.h
> index 817e712..1588fc8 100644
> --- a/xlators/cluster/afr/src/afr-self-heald.h
> +++ b/xlators/cluster/afr/src/afr-self-heald.h
> @@ -74,6 +74,7 @@ typedef struct afr_mt_process_entries_task_ctx_ {
> subvol_healer_t *healer;
> xlator_t *readdir_xl;
> inode_t *idx_inode; /* inode ref for xattrop dir */
> + call_frame_t *frame;
> unsigned int entries_healed;
> unsigned int entries_processed;
> unsigned int already_healed;
>
>
> Richard
> ________________________________________
> From: Ravishankar N [ravishankar at redhat.com]
> Sent: Sunday, February 07, 2016 11:15 PM
> To: Shreyas Siravara
> Cc: Richard Wareing; Vijay Bellur; Gluster Devel
> Subject: Re: [Gluster-devel] Throttling xlator on the bricks
>
> Hello,
>
> On 01/29/2016 06:51 AM, Shreyas Siravara wrote:
>> So the way our throttling works is (intentionally) very simplistic.
>>
>> (1) When someone mounts an NFS share, we tag the frame with a 32 bit hash of the export name they were authorized to mount.
>> (2) io-stats keeps track of the "current rate" of fops we're seeing for that particular mount, using a sampling of fops and a moving average over a short period of time.
>> (3) Based on whether the share violated its allowed rate (which is defined in a config file), we tag the FOP as "least-pri". Of course this makes the assumption that all NFS endpoints are receiving roughly the same # of FOPs. The rate defined in the config file is a *per* NFS endpoint number. So if your cluster has 10 NFS endpoints, and you've pre-computed that it can do roughly 1000 FOPs per second, the rate in the config file would be 100.
>> (4) IO-Threads then shoves the FOP into the least-pri queue, rather than its default. The value is honored all the way down to the bricks.
>>
>> The code is actually complete, and I'll put it up for review after we iron out a few minor issues.
> Did you get a chance to send the patch? Just wanted to run some tests
> and see if this is all we need at the moment to regulate shd traffic,
> especially with Richard's multi-threaded heal patch
> https://urldefense.proofpoint.com/v2/url?u=http-3A__review.gluster.org_-23_c_13329_&d=CwIC-g&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=B873EiTlTeUXIjEcoutZ6Py5KL0bwXIVroPbpwaKD8s&s=fo86UTOQWXf0nQZvvauqIIhlwoZHpRlQMNfQd7Ubu7g&e= being revived and made ready for 3.8.
>
> -Ravi
>
>>> On Jan 27, 2016, at 9:48 PM, Ravishankar N <ravishankar at redhat.com> wrote:
>>>
>>> On 01/26/2016 08:41 AM, Richard Wareing wrote:
>>>> In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it follows....bug prone) approaches.
>>>>
>>>> Richard
>>> Hi Shreyas,
>>>
>>> Wondering if you can share the details of the throttling feature you're working on. Even if there's no code, a description of what it is trying to achieve and how will be great.
>>>
>>> Thanks,
>>> Ravi
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list