[Gluster-devel] Performance experiments with io-stats translator

Thu Jun 8 06:16:20 UTC 2017

Hi,

This is what my job file contains:

[global]
ioengine=libaio
#unified_rw_reporting=1
randrepeat=1
norandommap=1
group_reporting
direct=1
runtime=60
thread
size=16g

[workload]
bs=4k
rw=randread
iodepth=8
numjobs=1
file_service_type=random
filename=/perf5/iotest/fio_5
filename=/perf6/iotest/fio_6
filename=/perf7/iotest/fio_7
filename=/perf8/iotest/fio_8

I have 3 vms reading from one mount, and each of these vms is running the
above job in parallel.

-Krutika

On Tue, Jun 6, 2017 at 9:14 PM, Manoj Pillai <mpillai at redhat.com> wrote:

>
>
> On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> Hi,
>>
>> As part of identifying performance bottlenecks within gluster stack for
>> VM image store use-case, I loaded io-stats at multiple points on the client
>> and brick stack and ran randrd test using fio from within the hosted vms in
>> parallel.
>>
>> Before I get to the results, a little bit about the configuration ...
>>
>> 3 node cluster; 1x3 plain replicate volume with group virt settings,
>> direct-io.
>> 3 FUSE clients, one per node in the cluster (which implies reads are
>> served from the replica that is local to the client).
>>
>> io-stats was loaded at the following places:
>> On the client stack: Above client-io-threads and above protocol/client-0
>> (the first child of AFR).
>> On the brick stack: Below protocol/server, above and below io-threads and
>> just above storage/posix.
>>
>> Based on a 60-second run of randrd test and subsequent analysis of the
>> stats dumped by the individual io-stats instances, the following is what I
>> found:
>>
>> *Translator Position*                       *Avg Latency of READ fop
>> as seen by this translator*
>>
>> 1. parent of client-io-threads                1666us
>>
>> ∆ (1,2) = 50us
>>
>> 2. parent of protocol/client-0                1616us
>>
>> ∆ (2,3) = 1453us
>>
>> ----------------- end of client stack ---------------------
>> ----------------- beginning of brick stack -----------
>>
>> 3. child of protocol/server                   163us
>>
>> ∆ (3,4) = 7us
>>
>> 4. parent of io-threads                        156us
>>
>> ∆ (4,5) = 20us
>>
>> 5. child-of-io-threads                          136us
>>
>> ∆ (5,6) = 11us
>>
>> 6. parent of storage/posix                   125us
>> ...
>> ---------------- end of brick stack ------------------------
>>
>> So it seems like the biggest bottleneck here is a combination of the
>> network + epoll, rpc layer?
>> I must admit I am no expert with networks, but I'm assuming if the client
>> is reading from the local brick, then
>> even latency contribution from the actual network won't be much, in which
>> case bulk of the latency is coming from epoll, rpc layer, etc at both
>> client and brick end? Please correct me if I'm wrong.
>>
>> I will, of course, do some more runs and confirm if the pattern is
>> consistent.
>>
>> -Krutika
>>
>>
> Really interesting numbers! How many concurrent requests are in flight in
> this test? Could you post the fio job? I'm wondering if/how these latency
> numbers change if you reduce the number of concurrent requests.
>
> -- Manoj
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170608/b08fc3a6/attachment-0001.html>