[Gluster-devel] Performance improvements

Sun Jan 27 07:03:16 UTC 2019

On Fri, 25 Jan 2019, 08:53 Vijay Bellur <vbellur at redhat.com wrote:

> Thank you for the detailed update, Xavi! This looks very interesting.
>
> On Thu, Jan 24, 2019 at 7:50 AM Xavi Hernandez <xhernandez at redhat.com>
> wrote:
>
>> Hi all,
>>
>> I've just updated a patch [1] that implements a new thread pool based on
>> a wait-free queue provided by userspace-rcu library. The patch also
>> includes an auto scaling mechanism that only keeps running the needed
>> amount of threads for the current workload.
>>
>> This new approach has some advantages:
>>
>>    - It's provided globally inside libglusterfs instead of inside an
>>    xlator
>>
>> This makes it possible that fuse thread and epoll threads transfer the
>> received request to another thread sooner, wating less CPU and reacting
>> sooner to other incoming requests.
>>
>>
>>    - Adding jobs to the queue used by the thread pool only requires an
>>    atomic operation
>>
>> This makes the producer side of the queue really fast, almost with no
>> delay.
>>
>>
>>    - Contention is reduced
>>
>> The producer side has negligible contention thanks to the wait-free
>> enqueue operation based on an atomic access. The consumer side requires a
>> mutex, but the duration is very small and the scaling mechanism makes sure
>> that there are no more threads than needed contending for the mutex.
>>
>>
>> This change disables io-threads, since it replaces part of its
>> functionality. However there are two things that could be needed from
>> io-threads:
>>
>>    - Prioritization of fops
>>
>> Currently, io-threads assigns priorities to each fop, so that some fops
>> are handled before than others.
>>
>>
>>    - Fair distribution of execution slots between clients
>>
>> Currently, io-threads processes requests from each client in round-robin.
>>
>>
>> These features are not implemented right now. If they are needed,
>> probably the best thing to do would be to keep them inside io-threads, but
>> change its implementation so that it uses the global threads from the
>> thread pool instead of its own threads.
>>
>
>
> These features are indeed useful to have and hence modifying the
> implementation of io-threads to provide this behavior would be welcome.
>
>
>
>>
>>
>> These tests have shown that the limiting factor has been the disk in most
>> cases, so it's hard to tell if the change has really improved things. There
>> is only one clear exception: self-heal on a dispersed volume completes
>> 12.7% faster. The utilization of CPU has also dropped drastically:
>>
>> Old implementation: 12.30 user, 41.78 sys, 43.16 idle,  0.73 wait
>>
>> New implementation: 4.91 user,  5.52 sys, 81.60 idle,  5.91 wait
>>
>>
>> Now I'm running some more tests on NVMe to try to see the effects of the
>> change when disk is not limiting performance. I'll update once I've more
>> data.
>>
>>
> Will look forward to these numbers.
>

I have identified an issue that limits the number of active threads when
load is high, causing some regressions. I'll fix it and rerun the tests on
Monday.

Xavi

>
> Regards,
> Vijay
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190127/6e26ad7e/attachment.html>