[Gluster-devel] Throttling xlator on the bricks

Fri Feb 12 18:43:01 UTC 2016

Hey Ravi,

I'll ping Shreyas about this today.  There's also a patch we'll need for multi-threaded SHD to fix the least-pri queuing.  The PID of the process wasn't tagged correctly via the call frame in my original patch.  The patch below fixes this (for 3.6.3), I didn't see multi-threaded self heal on github/master yet so let me know what branch you need this patch on and I can come up with a clean patch.

Richard


=====

diff --git a/xlators/cluster/afr/src/afr-self-heald.c b/xlators/cluster/afr/src/afr-self-heald.c
index 028010d..b0f6248 100644
--- a/xlators/cluster/afr/src/afr-self-heald.c
+++ b/xlators/cluster/afr/src/afr-self-heald.c
@@ -532,6 +532,9 @@ afr_mt_process_entries_done (int ret, call_frame_t *sync_frame,
                 pthread_cond_signal (&mt_data->task_done);
         }
         pthread_mutex_unlock (&mt_data->lock);
+
+        if (task_ctx->frame)
+                AFR_STACK_DESTROY (task_ctx->frame);
         GF_FREE (task_ctx);
         return 0;
 }
@@ -787,6 +790,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
         int                                   ret = -1;
         afr_mt_process_entries_task_ctx_t     *task_ctx;
         afr_mt_data_t                         *mt_data;
+        call_frame_t                          *frame = NULL;

         mt_data = &healer->mt_data;

@@ -799,6 +803,8 @@ _afr_mt_create_process_entries_task (xlator_t *this,
         if (!task_ctx)
                 goto err;

+        task_ctx->frame = afr_frame_create (this);
+
         INIT_LIST_HEAD (&task_ctx->list);
         task_ctx->readdir_xl = this;
         task_ctx->healer = healer;
@@ -812,7 +818,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
         // This returns immediately, and afr_mt_process_entries_done will
         // be called when the task is completed e.g. our queue is empty
         ret = synctask_new (this->ctx->env, afr_mt_process_entries_task,
-                afr_mt_process_entries_done, NULL,
+                afr_mt_process_entries_done, task_ctx->frame,
                 (void *)task_ctx);

         if (!ret) {
diff --git a/xlators/cluster/afr/src/afr-self-heald.h b/xlators/cluster/afr/src/afr-self-heald.h
index 817e712..1588fc8 100644
--- a/xlators/cluster/afr/src/afr-self-heald.h
+++ b/xlators/cluster/afr/src/afr-self-heald.h
@@ -74,6 +74,7 @@ typedef struct afr_mt_process_entries_task_ctx_ {
         subvol_healer_t         *healer;
         xlator_t                *readdir_xl;
         inode_t                 *idx_inode;  /* inode ref for xattrop dir */
+        call_frame_t            *frame;
         unsigned int            entries_healed;
         unsigned int            entries_processed;
         unsigned int            already_healed;


Richard
________________________________________
From: Ravishankar N [ravishankar at redhat.com]
Sent: Sunday, February 07, 2016 11:15 PM
To: Shreyas Siravara
Cc: Richard Wareing; Vijay Bellur; Gluster Devel
Subject: Re: [Gluster-devel] Throttling xlator on the bricks

Hello,

On 01/29/2016 06:51 AM, Shreyas Siravara wrote:
> So the way our throttling works is (intentionally) very simplistic.
>
> (1) When someone mounts an NFS share, we tag the frame with a 32 bit hash of the export name they were authorized to mount.
> (2) io-stats keeps track of the "current rate" of fops we're seeing for that particular mount, using a sampling of fops and a moving average over a short period of time.
> (3) Based on whether the share violated its allowed rate (which is defined in a config file), we tag the FOP as "least-pri". Of course this makes the assumption that all NFS endpoints are receiving roughly the same # of FOPs. The rate defined in the config file is a *per* NFS endpoint number. So if your cluster has 10 NFS endpoints, and you've pre-computed that it can do roughly 1000 FOPs per second, the rate in the config file would be 100.
> (4) IO-Threads then shoves the FOP into the least-pri queue, rather than its default. The value is honored all the way down to the bricks.
>
> The code is actually complete, and I'll put it up for review after we iron out a few minor issues.

Did you get a chance to send the patch? Just wanted to run some tests
and see if this is all we need at the moment to regulate shd traffic,
especially with Richard's multi-threaded heal patch
https://urldefense.proofpoint.com/v2/url?u=http-3A__review.gluster.org_-23_c_13329_&d=CwIC-g&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=B873EiTlTeUXIjEcoutZ6Py5KL0bwXIVroPbpwaKD8s&s=fo86UTOQWXf0nQZvvauqIIhlwoZHpRlQMNfQd7Ubu7g&e=  being revived and made ready for 3.8.

-Ravi

>
>> On Jan 27, 2016, at 9:48 PM, Ravishankar N <ravishankar at redhat.com> wrote:
>>
>> On 01/26/2016 08:41 AM, Richard Wareing wrote:
>>> In any event, it might be worth having Shreyas detail his throttling feature (that can throttle any directory hierarchy no less) to illustrate how a simpler design can achieve similar results to these more complicated (and it follows....bug prone) approaches.
>>>
>>> Richard
>> Hi Shreyas,
>>
>> Wondering if you can share the details of the throttling feature you're working on. Even if there's no code, a description of what it is trying to achieve and how will be great.
>>
>> Thanks,
>> Ravi