[Gluster-devel] Performance experiments with io-stats translator

Thu Jun 8 13:07:52 UTC 2017

Indeed the latency on the client side dropped with iodepth=1. :)
I ran the test twice and the results were consistent.

Here are the exact numbers:

*Translator Position*                       *Avg Latency of READ fop as
seen by this translator*

1. parent of client-io-threads                437us

∆ (1,2) = 69us

2. parent of protocol/client-0                368us

∆ (2,3) = 171us

----------------- end of client stack ---------------------
----------------- beginning of brick stack --------------

3. child of protocol/server                   197us

∆ (3,4) = 4us

4. parent of io-threads                        193us

∆ (4,5) = 32us

5. child-of-io-threads                          161us

∆ (5,6) = 11us

6. parent of storage/posix                   150us
...
---------------- end of brick stack ------------------------

Will continue reading code and get back when I find sth concrete.

-Krutika

On Thu, Jun 8, 2017 at 12:22 PM, Manoj Pillai <mpillai at redhat.com> wrote:

> Thanks. So I was suggesting a repeat of the test but this time with
> iodepth=1 in the fio job. If reducing the no. of concurrent requests
>  reduces drastically the high latency you're seeing from the client-side,
> that would strengthen the hypothesis than serialization/contention among
> concurrent requests at the n/w layers is the root cause here.
>
> -- Manoj
>
>
> On Thu, Jun 8, 2017 at 11:46 AM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> Hi,
>>
>> This is what my job file contains:
>>
>> [global]
>> ioengine=libaio
>> #unified_rw_reporting=1
>> randrepeat=1
>> norandommap=1
>> group_reporting
>> direct=1
>> runtime=60
>> thread
>> size=16g
>>
>>
>> [workload]
>> bs=4k
>> rw=randread
>> iodepth=8
>> numjobs=1
>> file_service_type=random
>> filename=/perf5/iotest/fio_5
>> filename=/perf6/iotest/fio_6
>> filename=/perf7/iotest/fio_7
>> filename=/perf8/iotest/fio_8
>>
>> I have 3 vms reading from one mount, and each of these vms is running the
>> above job in parallel.
>>
>> -Krutika
>>
>> On Tue, Jun 6, 2017 at 9:14 PM, Manoj Pillai <mpillai at redhat.com> wrote:
>>
>>>
>>>
>>> On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay <kdhananj at redhat.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> As part of identifying performance bottlenecks within gluster stack for
>>>> VM image store use-case, I loaded io-stats at multiple points on the client
>>>> and brick stack and ran randrd test using fio from within the hosted vms in
>>>> parallel.
>>>>
>>>> Before I get to the results, a little bit about the configuration ...
>>>>
>>>> 3 node cluster; 1x3 plain replicate volume with group virt settings,
>>>> direct-io.
>>>> 3 FUSE clients, one per node in the cluster (which implies reads are
>>>> served from the replica that is local to the client).
>>>>
>>>> io-stats was loaded at the following places:
>>>> On the client stack: Above client-io-threads and above
>>>> protocol/client-0 (the first child of AFR).
>>>> On the brick stack: Below protocol/server, above and below io-threads
>>>> and just above storage/posix.
>>>>
>>>> Based on a 60-second run of randrd test and subsequent analysis of the
>>>> stats dumped by the individual io-stats instances, the following is what I
>>>> found:
>>>>
>>>> *Translator Position*                       *Avg Latency of READ fop
>>>> as seen by this translator*
>>>>
>>>> 1. parent of client-io-threads                1666us
>>>>
>>>> ∆ (1,2) = 50us
>>>>
>>>> 2. parent of protocol/client-0                1616us
>>>>
>>>> ∆ (2,3) = 1453us
>>>>
>>>> ----------------- end of client stack ---------------------
>>>> ----------------- beginning of brick stack -----------
>>>>
>>>> 3. child of protocol/server                   163us
>>>>
>>>> ∆ (3,4) = 7us
>>>>
>>>> 4. parent of io-threads                        156us
>>>>
>>>> ∆ (4,5) = 20us
>>>>
>>>> 5. child-of-io-threads                          136us
>>>>
>>>> ∆ (5,6) = 11us
>>>>
>>>> 6. parent of storage/posix                   125us
>>>> ...
>>>> ---------------- end of brick stack ------------------------
>>>>
>>>> So it seems like the biggest bottleneck here is a combination of the
>>>> network + epoll, rpc layer?
>>>> I must admit I am no expert with networks, but I'm assuming if the
>>>> client is reading from the local brick, then
>>>> even latency contribution from the actual network won't be much, in
>>>> which case bulk of the latency is coming from epoll, rpc layer, etc at both
>>>> client and brick end? Please correct me if I'm wrong.
>>>>
>>>> I will, of course, do some more runs and confirm if the pattern is
>>>> consistent.
>>>>
>>>> -Krutika
>>>>
>>>>
>>> Really interesting numbers! How many concurrent requests are in flight
>>> in this test? Could you post the fio job? I'm wondering if/how these
>>> latency numbers change if you reduce the number of concurrent requests.
>>>
>>> -- Manoj
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170608/92df71a0/attachment.html>