[Gluster-devel] Changing the relative order of read-ahead and open-behind
Amar Tumballi
atumball at redhat.com
Tue Jul 25 10:36:27 UTC 2017
On Tue, Jul 25, 2017 at 2:38 PM, Raghavendra G <raghavendra at gluster.com>
wrote:
>
>
> On Tue, Jul 25, 2017 at 10:39 AM, Amar Tumballi <atumball at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Jul 25, 2017 at 9:33 AM, Raghavendra Gowdappa <
>> rgowdapp at redhat.com> wrote:
>>
>>>
>>>
>>> ----- Original Message -----
>>> > From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>> > To: "Raghavendra G" <raghavendra at gluster.com>
>>> > Cc: "Gluster Devel" <gluster-devel at gluster.org>
>>> > Sent: Tuesday, July 25, 2017 7:51:07 AM
>>> > Subject: Re: [Gluster-devel] Changing the relative order of read-ahead
>>> and open-behind
>>> >
>>> >
>>> >
>>> > On Mon, Jul 24, 2017 at 5:11 PM, Raghavendra G <
>>> raghavendra at gluster.com >
>>> > wrote:
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Fri, Jul 21, 2017 at 6:39 PM, Vijay Bellur < vbellur at redhat.com >
>>> wrote:
>>> >
>>> >
>>> >
>>> >
>>> > On Fri, Jul 21, 2017 at 3:26 AM, Raghavendra Gowdappa <
>>> rgowdapp at redhat.com >
>>> > wrote:
>>> >
>>> >
>>> > Hi all,
>>> >
>>> > We've a bug [1], due to which read-ahead is completely disabled when
>>> the
>>> > workload is read-only. One of the easy fix was to make read-ahead as an
>>> > ancestor of open-behind in xlator graph (Currently its a descendant). A
>>> > patch has been sent out by Rafi to do the same. As noted in one of the
>>> > comments, one flip side of this solution is that small files (which are
>>> > eligible to be cached by quick read) are cached twice - once each in
>>> > read-ahead and quick-read - wasting up precious memory. However, there
>>> are
>>> > no other simpler solutions for this issue. If you've concerns on the
>>> > approach followed by [2] or have other suggestions please voice them
>>> out.
>>> > Otherwise, I am planning to merge [2] for lack of better alternatives.
>>> >
>>> >
>>> > Since the maximum size of files cached by quick-read is 64KB, can we
>>> have
>>> > read-ahead kick in for offsets greater than 64KB?
>>> >
>>> > I got your point. We can enable read-ahead only for files whose size is
>>> > greater than the size eligible for caching quick-read. IOW, read-ahead
>>> gets
>>> > disabled if file size is less than 64KB. Thanks for the suggestion.
>>> >
>>> > I added a comment on the patch to move the xlators in reverse to the
>>> way the
>>> > patch is currently doing. Milind I think implemented it. Will that
>>> lead to
>>> > any problem?
>>>
>>> From gerrit:
>>>
>>> <comment>
>>>
>>> It fixes the issue too and it is a better solution than the current one
>>> as it doesn't run into duplicate cache problem. The reason open-behind was
>>> loaded as an ancestor of quick-read was that it seemed unnecessary that
>>> quick-read should even witness an open. However,
>>>
>>> * looking into code qr_open is indeed setting some priority for the
>>> inode which will be used during purging of cache due to exceeding cache
>>> limit. So, it helps quick read to witness an open.
>>> * the real benefit of open-behind is avoiding fops over network. So,
>>> as long as open-behind is loaded in client stack, we reap its benefits.
>>> * Also note that if option "read-after-open" is set in open-behind,
>>> an open is anyways done over network irrespective of whether quick-read has
>>> cached the file, which to me looks unnecessary. By moving open-behind as a
>>> descendant of quick-read, open-behind won't even witness a read when the
>>> file is cached by quick-read. But, if read-after-open option is implemented
>>> in open-behind with the goal of fixing non-posix compliance for the case of
>>> open fd on a file is unlinked, we might regress. But again, even this
>>> approach doesn't fix the compliance problem completely. One has to turn
>>> open-behind off to be completely posix complaint in this scenario.
>>>
>>> Given the reasons above, it helps just moving open-behind as a
>>> descendant of read-ahead.
>>>
>>> </comment>
>>>
>>>
>> Analysis looks good. But I would like us (all developers) to backup the
>> theories like this with some data.
>>
>
>> How about you plan a test case which can demonstrate the difference ?
>>
>
> What is the scenario you want to measure here?
>
>
Scenario where by changing the order, the number of fops on wire would be
different. Also if you have any particular internal metrics of these
translators, on experimental branch, you can implement 'dump_metrics()'
method and that can be measured in graphite/grafana.
-Amar
> I will help you set up metrics measuring with graphs [1] on experimental
>> branch [2] to actually measure and graphically represent the hypothesis.
>>
>> We can set this as an example for future for anyone to try the
>> permutation & combination of different xlator order. Who knows we may
>> realize, for different work load, different order may be suitable.
>>
>> Regards,
>> Amar
>>
>> [1] - https://github.com/amarts/glustermetrics
>> [2] - https://github.com/gluster/glusterfs/tree/experimental
>>
>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Thanks,
>>> > Vijay
>>> >
>>> > _______________________________________________
>>> > Gluster-devel mailing list
>>> > Gluster-devel at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-devel
>>> >
>>> >
>>> >
>>> > --
>>> > Raghavendra G
>>> >
>>> > _______________________________________________
>>> > Gluster-devel mailing list
>>> > Gluster-devel at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-devel
>>> >
>>> >
>>> >
>>> > --
>>> > Pranith
>>> >
>>> > _______________________________________________
>>> > Gluster-devel mailing list
>>> > Gluster-devel at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-devel
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>
--
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170725/3c192991/attachment-0001.html>
More information about the Gluster-devel
mailing list