[Gluster-devel] Throttling xlator on the bricks

Sat Feb 13 03:36:44 UTC 2016


On 02/13/2016 12:13 AM, Richard Wareing wrote:
> Hey Ravi,
>
> I'll ping Shreyas about this today.  There's also a patch we'll need for multi-threaded SHD to fix the least-pri queuing.  The PID of the process wasn't tagged correctly via the call frame in my original patch.  The patch below fixes this (for 3.6.3), I didn't see multi-threaded self heal on github/master yet so let me know what branch you need this patch on and I can come up with a clean patch.

Hi Richard,
          I reviewed the patch and found that the same needs to be done 
even for ec. So I am thinking if I can split it out as two different 
patches, 1 patch in syncop-utils which builds the functionality of 
parallelization. Another patch which uses this in afr, ec. Do you mind 
if I give it a go? I can complete it by end of Wednesday.

Pranith
>
> Richard
>
>
> =====
>
>
> diff --git a/xlators/cluster/afr/src/afr-self-heald.c b/xlators/cluster/afr/src/afr-self-heald.c
> index 028010d..b0f6248 100644
> --- a/xlators/cluster/afr/src/afr-self-heald.c
> +++ b/xlators/cluster/afr/src/afr-self-heald.c
> @@ -532,6 +532,9 @@ afr_mt_process_entries_done (int ret, call_frame_t *sync_frame,
>                   pthread_cond_signal (&mt_data->task_done);
>           }
>           pthread_mutex_unlock (&mt_data->lock);
> +
> +        if (task_ctx->frame)
> +                AFR_STACK_DESTROY (task_ctx->frame);
>           GF_FREE (task_ctx);
>           return 0;
>   }
> @@ -787,6 +790,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
>           int                                   ret = -1;
>           afr_mt_process_entries_task_ctx_t     *task_ctx;
>           afr_mt_data_t                         *mt_data;
> +        call_frame_t                          *frame = NULL;
>
>           mt_data = &healer->mt_data;
>
> @@ -799,6 +803,8 @@ _afr_mt_create_process_entries_task (xlator_t *this,
>           if (!task_ctx)
>                   goto err;
>
> +        task_ctx->frame = afr_frame_create (this);
> +
>           INIT_LIST_HEAD (&task_ctx->list);
>           task_ctx->readdir_xl = this;
>           task_ctx->healer = healer;
> @@ -812,7 +818,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
>           // This returns immediately, and afr_mt_process_entries_done will
>           // be called when the task is completed e.g. our queue is empty
>           ret = synctask_new (this->ctx->env, afr_mt_process_entries_task,
> -                afr_mt_process_entries_done, NULL,
> +                afr_mt_process_entries_done, task_ctx->frame,
>                   (void *)task_ctx);
>
>           if (!ret) {
> diff --git a/xlators/cluster/afr/src/afr-self-heald.h b/xlators/cluster/afr/src/afr-self-heald.h
> index 817e712..1588fc8 100644
> --- a/xlators/cluster/afr/src/afr-self-heald.h
> +++ b/xlators/cluster/afr/src/afr-self-heald.h
> @@ -74,6 +74,7 @@ typedef struct afr_mt_process_entries_task_ctx_ {
>           subvol_healer_t         *healer;
>           xlator_t                *readdir_xl;
>           inode_t                 *idx_inode;  /* inode ref for xattrop dir */
> +        call_frame_t            *frame;
>           unsigned int            entries_healed;
>           unsigned int            entries_processed;
>           unsigned int            already_healed;
>
>
> Richard
> ________________________________________
> From: Ravishankar N [ravishankar at redhat.com]
> Sent: Sunday, February 07, 2016 11:15 PM
> To: Shreyas Siravara
> Cc: Richard Wareing; Vijay Bellur; Gluster Devel
> Subject: Re: [Gluster-devel] Throttling xlator on the bricks
>
> Hello,
>
> On 01/29/2016 06:51 AM, Shreyas Siravara wrote:
>> So the way our throttling works is (intentionally) very simplistic.
>>
>> (1) When someone mounts an NFS share, we tag the frame with a 32 bit hash of the export name they were authorized to mount.
>> (2) io-stats keeps track of the "current rate" of fops we're seeing for that particular mount, using a sampling of fops and a moving average over a short period of time.
>> (3) Based on whether the share violated its allowed rate (which is defined in a config file), we tag the FOP as "least-pri". Of course this makes the assumption that all NFS endpoints are receiving roughly the same # of FOPs. The rate defined in the config file is a *per* NFS endpoint number. So if your cluster has 10 NFS endpoints, and you've pre-computed that it can do roughly 1000 FOPs per second, the rate in the config file would be 100.
>> (4) IO-Threads then shoves the FOP into the least-pri queue, rather than its default. The value is honored all the way down to the bricks.
>>
>> The code is actually complete, and I'll put it up for review after we iron out a few minor issues.
> Did you get a chance to send the patch? Just wanted to run some tests
> and see if this is all we need at the moment to regulate shd traffic,
> especially with Richard's multi-threaded heal patch
> https://urldefense.proofpoint.com/v2/url?u=http-3A__review.gluster.org_-23_c_13329_&d=CwIC-g&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=B873EiTlTeUXIjEcoutZ6Py5KL0bwXIVroPbpwaKD8s&s=fo86UTOQWXf0nQZvvauqIIhlwoZHpRlQMNfQd7Ubu7g&e=  being revived and made ready for 3.8.
>
> -Ravi
>
>>> On Jan 27, 2016, at 9:48 PM, Ravishankar N <ravishankar at redhat.com> wrote:
>>>
>>> On 01/26/2016 08:41 AM, Richard Wareing wrote:
>>>> In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it follows....bug prone) approaches.
>>>>
>>>> Richard
>>> Hi Shreyas,
>>>
>>> Wondering if you can share the details of the throttling feature you're working on. Even if there's no code, a description of what it is trying to achieve and how will be great.
>>>
>>> Thanks,
>>> Ravi
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel