[Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

Mon Dec 31 10:14:02 UTC 2018

On Fri 28 Dec, 2018, 12:44 Raghavendra Gowdappa <rgowdapp at redhat.com wrote:

>
>
> On Mon, Dec 24, 2018 at 6:05 PM Raghavendra Gowdappa <rgowdapp at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Dec 24, 2018 at 3:40 PM Sankarshan Mukhopadhyay <
>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>
>>> [pulling the conclusions up to enable better in-line]
>>>
>>> > Conclusions:
>>> >
>>> > We should never have a volume with caching-related xlators disabled.
>>> The price we pay for it is too high. We need to make them work consistently
>>> and aggressively to avoid as many requests as we can.
>>>
>>> Are there current issues in terms of behavior which are known/observed
>>> when these are enabled?
>>>
>>
>> We did have issues with pgbench in past. But they've have been fixed.
>> Please refer to bz [1] for details. On 5.1, it runs successfully with all
>> caching related xlators enabled. Having said that the only performance
>> xlators which gave improved performance were open-behind and write-behind
>> [2] (write-behind had some issues, which will be fixed by [3] and we'll
>> have to measure performance again with fix to [3]).
>>
>
> One quick update. Enabling write-behind and md-cache with fix for [3]
> reduced the total time taken for pgbench init phase roughly by 20%-25%
> (from 12.5 min to 9.75 min for a scale of 100). Though this is still a huge
> time (around 12hrs for a db of scale 8000). I'll follow up with a detailed
> report once my experiments are complete. Currently trying to optimize the
> read path.
>
>
>> For some reason, read-side caching didn't improve transactions per
>> second. I am working on this problem currently. Note that these bugs
>> measure transaction phase of pgbench, but what xavi measured in his mail is
>> init phase. Nevertheless, evaluation of read caching (metadata/data) will
>> still be relevant for init phase too.
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1629589#c4
>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1648781
>>
>
I think that what I am looking forward to is a well defined set of next
steps and potential update to this list windows that eventually result in a
formal and recorded procedure to ensure that Gluster performs best for
these application workloads.

>>
>>
>>> > We need to analyze client/server xlators deeper to see if we can avoid
>>> some delays. However optimizing something that is already at the
>>> microsecond level can be very hard.
>>>
>>> That is true - are there any significant gains which can be accrued by
>>> putting efforts here or, should this be a lower priority?
>>>
>>
>> The problem identified by xavi is also the one we (Manoj, Krutika, me and
>> Milind) had encountered in the past [4]. The solution we used was to have
>> multiple rpc connections between single brick and client. The solution
>> indeed fixed the bottleneck. So, there is definitely work involved here -
>> either to fix the single connection model or go with multiple connection
>> model. Its preferred to improve single connection and resort to multiple
>> connections only if bottlenecks in single connection are not fixable.
>> Personally I think this is high priority along with having appropriate
>> client side caching.
>>
>> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c52
>>
>>
>>> > We need to determine what causes the fluctuations in brick side and
>>> avoid them.
>>> > This scenario is very similar to a smallfile/metadata workload, so
>>> this is probably one important cause of its bad performance.
>>>
>>> What kind of instrumentation is required to enable the determination?
>>>
>>> On Fri, Dec 21, 2018 at 1:48 PM Xavi Hernandez <xhernandez at redhat.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I've done some tracing of the latency that network layer introduces in
>>> gluster. I've made the analysis as part of the pgbench performance issue
>>> (in particulat the initialization and scaling phase), so I decided to look
>>> at READV for this particular workload, but I think the results can be
>>> extrapolated to other operations that also have small latency (cached data
>>> from FS for example).
>>> >
>>> > Note that measuring latencies introduces some latency. It consists in
>>> a call to clock_get_time() for each probe point, so the real latency will
>>> be a bit lower, but still proportional to these numbers.
>>> >
>>>
>>> [snip]
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20181231/728597f2/attachment.html>