[Gluster-devel] Performance experiments with io-stats translator

Thu Jun 8 06:17:51 UTC 2017

Hi,

So I used Sanjay's setup to get these numbers. So I'm guessing it's a 10G
network. I will check again and let you know if that isn't the case.

-Krutika

On Tue, Jun 6, 2017 at 9:38 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> Nice work!
>
> What is the network interconnect bandwidth? How much of the network
> bandwidth is in use while the test is being run? Wondering if there is
> saturation in the network layer.
>
> -Vijay
>
> On Tue, Jun 6, 2017 at 7:35 AM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> Hi,
>>
>> As part of identifying performance bottlenecks within gluster stack for
>> VM image store use-case, I loaded io-stats at multiple points on the client
>> and brick stack and ran randrd test using fio from within the hosted vms in
>> parallel.
>>
>> Before I get to the results, a little bit about the configuration ...
>>
>> 3 node cluster; 1x3 plain replicate volume with group virt settings,
>> direct-io.
>> 3 FUSE clients, one per node in the cluster (which implies reads are
>> served from the replica that is local to the client).
>>
>> io-stats was loaded at the following places:
>> On the client stack: Above client-io-threads and above protocol/client-0
>> (the first child of AFR).
>> On the brick stack: Below protocol/server, above and below io-threads and
>> just above storage/posix.
>>
>> Based on a 60-second run of randrd test and subsequent analysis of the
>> stats dumped by the individual io-stats instances, the following is what I
>> found:
>>
>> *Translator Position*                       *Avg Latency of READ fop
>> as seen by this translator*
>>
>> 1. parent of client-io-threads                1666us
>>
>> ∆ (1,2) = 50us
>>
>> 2. parent of protocol/client-0                1616us
>>
>> ∆ (2,3) = 1453us
>>
>> ----------------- end of client stack ---------------------
>> ----------------- beginning of brick stack -----------
>>
>> 3. child of protocol/server                   163us
>>
>> ∆ (3,4) = 7us
>>
>> 4. parent of io-threads                        156us
>>
>> ∆ (4,5) = 20us
>>
>> 5. child-of-io-threads                          136us
>>
>> ∆ (5,6) = 11us
>>
>> 6. parent of storage/posix                   125us
>> ...
>> ---------------- end of brick stack ------------------------
>>
>> So it seems like the biggest bottleneck here is a combination of the
>> network + epoll, rpc layer?
>> I must admit I am no expert with networks, but I'm assuming if the client
>> is reading from the local brick, then
>> even latency contribution from the actual network won't be much, in which
>> case bulk of the latency is coming from epoll, rpc layer, etc at both
>> client and brick end? Please correct me if I'm wrong.
>>
>> I will, of course, do some more runs and confirm if the pattern is
>> consistent.
>>
>> -Krutika
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170608/bdc18350/attachment.html>