[Gluster-devel] Performance experiments with io-stats translator

Tue Jun 6 15:44:27 UTC 2017

On Tue, Jun 6, 2017 at 5:05 PM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:

> Hi,
>
> As part of identifying performance bottlenecks within gluster stack for VM
> image store use-case, I loaded io-stats at multiple points on the client
> and brick stack and ran randrd test using fio from within the hosted vms in
> parallel.
>
> Before I get to the results, a little bit about the configuration ...
>
> 3 node cluster; 1x3 plain replicate volume with group virt settings,
> direct-io.
> 3 FUSE clients, one per node in the cluster (which implies reads are
> served from the replica that is local to the client).
>
> io-stats was loaded at the following places:
> On the client stack: Above client-io-threads and above protocol/client-0
> (the first child of AFR).
> On the brick stack: Below protocol/server, above and below io-threads and
> just above storage/posix.
>
> Based on a 60-second run of randrd test and subsequent analysis of the
> stats dumped by the individual io-stats instances, the following is what I
> found:
>
> *Translator Position*                       *Avg Latency of READ fop as
> seen by this translator*
>
> 1. parent of client-io-threads                1666us
>
> ∆ (1,2) = 50us
>
> 2. parent of protocol/client-0                1616us
>
> ∆ (2,3) = 1453us
>
> ----------------- end of client stack ---------------------
> ----------------- beginning of brick stack -----------
>
> 3. child of protocol/server                   163us
>
> ∆ (3,4) = 7us
>
> 4. parent of io-threads                        156us
>
> ∆ (4,5) = 20us
>
> 5. child-of-io-threads                          136us
>
> ∆ (5,6) = 11us
>
> 6. parent of storage/posix                   125us
> ...
> ---------------- end of brick stack ------------------------
>
> So it seems like the biggest bottleneck here is a combination of the
> network + epoll, rpc layer?
> I must admit I am no expert with networks, but I'm assuming if the client
> is reading from the local brick, then
> even latency contribution from the actual network won't be much, in which
> case bulk of the latency is coming from epoll, rpc layer, etc at both
> client and brick end? Please correct me if I'm wrong.
>
> I will, of course, do some more runs and confirm if the pattern is
> consistent.
>
> -Krutika
>
>
Really interesting numbers! How many concurrent requests are in flight in
this test? Could you post the fio job? I'm wondering if/how these latency
numbers change if you reduce the number of concurrent requests.

-- Manoj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170606/0c6c3b1e/attachment.html>