[Gluster-devel] Changing the relative order of read-ahead and open-behind

Tue Jul 25 10:48:36 UTC 2017

----- Original Message -----
> From: "Amar Tumballi" <atumball at redhat.com>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Tuesday, July 25, 2017 4:06:27 PM
> Subject: Re: [Gluster-devel] Changing the relative order of read-ahead and open-behind
> 
> On Tue, Jul 25, 2017 at 2:38 PM, Raghavendra G <raghavendra at gluster.com>
> wrote:
> 
> >
> >
> > On Tue, Jul 25, 2017 at 10:39 AM, Amar Tumballi <atumball at redhat.com>
> > wrote:
> >
> >>
> >>
> >> On Tue, Jul 25, 2017 at 9:33 AM, Raghavendra Gowdappa <
> >> rgowdapp at redhat.com> wrote:
> >>
> >>>
> >>>
> >>> ----- Original Message -----
> >>> > From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>> > To: "Raghavendra G" <raghavendra at gluster.com>
> >>> > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>> > Sent: Tuesday, July 25, 2017 7:51:07 AM
> >>> > Subject: Re: [Gluster-devel] Changing the relative order of read-ahead
> >>> and    open-behind
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Jul 24, 2017 at 5:11 PM, Raghavendra G <
> >>> raghavendra at gluster.com >
> >>> > wrote:
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Jul 21, 2017 at 6:39 PM, Vijay Bellur < vbellur at redhat.com >
> >>> wrote:
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Jul 21, 2017 at 3:26 AM, Raghavendra Gowdappa <
> >>> rgowdapp at redhat.com >
> >>> > wrote:
> >>> >
> >>> >
> >>> > Hi all,
> >>> >
> >>> > We've a bug [1], due to which read-ahead is completely disabled when
> >>> the
> >>> > workload is read-only. One of the easy fix was to make read-ahead as an
> >>> > ancestor of open-behind in xlator graph (Currently its a descendant). A
> >>> > patch has been sent out by Rafi to do the same. As noted in one of the
> >>> > comments, one flip side of this solution is that small files (which are
> >>> > eligible to be cached by quick read) are cached twice - once each in
> >>> > read-ahead and quick-read - wasting up precious memory. However, there
> >>> are
> >>> > no other simpler solutions for this issue. If you've concerns on the
> >>> > approach followed by [2] or have other suggestions please voice them
> >>> out.
> >>> > Otherwise, I am planning to merge [2] for lack of better alternatives.
> >>> >
> >>> >
> >>> > Since the maximum size of files cached by quick-read is 64KB, can we
> >>> have
> >>> > read-ahead kick in for offsets greater than 64KB?
> >>> >
> >>> > I got your point. We can enable read-ahead only for files whose size is
> >>> > greater than the size eligible for caching quick-read. IOW, read-ahead
> >>> gets
> >>> > disabled if file size is less than 64KB. Thanks for the suggestion.
> >>> >
> >>> > I added a comment on the patch to move the xlators in reverse to the
> >>> way the
> >>> > patch is currently doing. Milind I think implemented it. Will that
> >>> lead to
> >>> > any problem?
> >>>
> >>> From gerrit:
> >>>
> >>> <comment>
> >>>
> >>> It fixes the issue too and it is a better solution than the current one
> >>> as it doesn't run into duplicate cache problem. The reason open-behind
> >>> was
> >>> loaded as an ancestor of quick-read was that it seemed unnecessary that
> >>> quick-read should even witness an open. However,
> >>>
> >>>    * looking into code qr_open is indeed setting some priority for the
> >>> inode which will be used during purging of cache due to exceeding cache
> >>> limit. So, it helps quick read to witness an open.
> >>>    * the real benefit of open-behind is avoiding fops over network. So,
> >>> as long as open-behind is loaded in client stack, we reap its benefits.
> >>>    * Also note that if option "read-after-open" is set in open-behind,
> >>> an open is anyways done over network irrespective of whether quick-read
> >>> has
> >>> cached the file, which to me looks unnecessary. By moving open-behind as
> >>> a
> >>> descendant of quick-read, open-behind won't even witness a read when the
> >>> file is cached by quick-read. But, if read-after-open option is
> >>> implemented
> >>> in open-behind with the goal of fixing non-posix compliance for the case
> >>> of
> >>> open fd on a file is unlinked, we might regress. But again, even this
> >>> approach doesn't fix the compliance problem completely. One has to turn
> >>> open-behind off to be completely posix complaint in this scenario.
> >>>
> >>> Given the reasons above, it helps just moving open-behind as a
> >>> descendant of read-ahead.
> >>>
> >>> </comment>
> >>>
> >>>
> >> Analysis looks good. But I would like us (all developers) to backup the
> >> theories like this with some data.
> >>
> >
> >> How about you plan a test case which can demonstrate the difference ?
> >>
> >
> > What is the scenario you want to measure here?
> >
> >
> 
> Scenario where by changing the order, the number of fops on wire would be
> different. Also if you have any particular internal metrics of these
> translators, on experimental branch, you can implement 'dump_metrics()'
> method and that can be measured in graphite/grafana.

Ok. Basically you are trying to validate the fix that it solves the issue. I think measuring latency and throughput for reads at application is sufficient enough for that I think. Bug has a test case and we can use it to validate the fix. Of course, we can watch for other metrics too and see anything abnormal happens (as in for eg., we shouldn't start seeing setattr calls out of the blue in a read workload).

Also we shouldn't have regressed in other related areas.

So, metrics I think useful are:

* measure read latency and throughput at application layer
* Since this patch touches open-behind, check whether there is any change in number of opens sent over network
* Since we were concerned with quick-read too, check whether reads on files smaller than 64K are done over network (they shouldn't be as they are expected to be served by quick-read)

regards,
Raghavendra
> 
> -Amar
> 
> 
> > I will help you set up metrics measuring with graphs [1] on experimental
> >> branch [2] to actually measure and graphically represent the hypothesis.
> >>
> >> We can set this as an example for future for anyone to try the
> >> permutation & combination of different xlator order. Who knows we may
> >> realize, for different work load, different order may be suitable.
> >>
> >> Regards,
> >> Amar
> >>
> >> [1] - https://github.com/amarts/glustermetrics
> >> [2] - https://github.com/gluster/glusterfs/tree/experimental
> >>
> >> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > Thanks,
> >>> > Vijay
> >>> >
> >>> > _______________________________________________
> >>> > Gluster-devel mailing list
> >>> > Gluster-devel at gluster.org
> >>> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Raghavendra G
> >>> >
> >>> > _______________________________________________
> >>> > Gluster-devel mailing list
> >>> > Gluster-devel at gluster.org
> >>> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Pranith
> >>> >
> >>> > _______________________________________________
> >>> > Gluster-devel mailing list
> >>> > Gluster-devel at gluster.org
> >>> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-devel
> >>>
> >>
> >>
> >>
> >> --
> >> Amar Tumballi (amarts)
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-devel
> >>
> >
> >
> >
> > --
> > Raghavendra G
> >
> 
> 
> 
> --
> Amar Tumballi (amarts)
>