[Gluster-devel] Performance improvements
Vijay Bellur
vbellur at redhat.com
Fri Jan 25 07:53:12 UTC 2019
Thank you for the detailed update, Xavi! This looks very interesting.
On Thu, Jan 24, 2019 at 7:50 AM Xavi Hernandez <xhernandez at redhat.com>
wrote:
> Hi all,
>
> I've just updated a patch [1] that implements a new thread pool based on a
> wait-free queue provided by userspace-rcu library. The patch also includes
> an auto scaling mechanism that only keeps running the needed amount of
> threads for the current workload.
>
> This new approach has some advantages:
>
> - It's provided globally inside libglusterfs instead of inside an
> xlator
>
> This makes it possible that fuse thread and epoll threads transfer the
> received request to another thread sooner, wating less CPU and reacting
> sooner to other incoming requests.
>
>
> - Adding jobs to the queue used by the thread pool only requires an
> atomic operation
>
> This makes the producer side of the queue really fast, almost with no
> delay.
>
>
> - Contention is reduced
>
> The producer side has negligible contention thanks to the wait-free
> enqueue operation based on an atomic access. The consumer side requires a
> mutex, but the duration is very small and the scaling mechanism makes sure
> that there are no more threads than needed contending for the mutex.
>
>
> This change disables io-threads, since it replaces part of its
> functionality. However there are two things that could be needed from
> io-threads:
>
> - Prioritization of fops
>
> Currently, io-threads assigns priorities to each fop, so that some fops
> are handled before than others.
>
>
> - Fair distribution of execution slots between clients
>
> Currently, io-threads processes requests from each client in round-robin.
>
>
> These features are not implemented right now. If they are needed, probably
> the best thing to do would be to keep them inside io-threads, but change
> its implementation so that it uses the global threads from the thread pool
> instead of its own threads.
>
These features are indeed useful to have and hence modifying the
implementation of io-threads to provide this behavior would be welcome.
>
>
> These tests have shown that the limiting factor has been the disk in most
> cases, so it's hard to tell if the change has really improved things. There
> is only one clear exception: self-heal on a dispersed volume completes
> 12.7% faster. The utilization of CPU has also dropped drastically:
>
> Old implementation: 12.30 user, 41.78 sys, 43.16 idle, 0.73 wait
>
> New implementation: 4.91 user, 5.52 sys, 81.60 idle, 5.91 wait
>
>
> Now I'm running some more tests on NVMe to try to see the effects of the
> change when disk is not limiting performance. I'll update once I've more
> data.
>
>
Will look forward to these numbers.
Regards,
Vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190124/8723dd4a/attachment.html>
More information about the Gluster-devel
mailing list