[Gluster-devel] Multiplexing - good news, bad news, and a plea for help

Tue Sep 20 12:02:59 UTC 2016

> That's weird, since the only purpose of the mem-pool was precisely to
> improve performance of allocation of objects that are frequently
> allocated/released.

Very true, and I've long been an advocate of this approach.
Unfortunately, for this to work our allocator has to be more efficient
than the system's, and it's not - especially wrt locking.  Overhead is
high and contention is even higher, heavily outweighing any advantage.
Unless/until we put in the work to make mem-pools perform better at high
thread counts, avoiding them seems like the practical choice.

> * Consider http://review.gluster.org/15036/. With all communications
> going through the same socket, the problem this patch tries to solve
> could become worse.

I'll look into this.  Thanks!

> * We should consider the possibility of implementing a global thread
> pool, which would replace io-threads, epoll threads and maybe others.
> Synctasks should also rely on this thread pool. This has the benefit
> of better controlling the total number of threads. Otherwise when we
> have more threads than processor cores, we waste resources
> unnecessarily and we won't get a real gain. Even worse, it could start
> to degrade due to contention.

Also a good idea, though perhaps too hard/complex to tackle in the short
term.  I did take a stab at making io-threads use a single global set of
queues instead of per instance, to address a similar concern.  To make a
long story short, it didn't seem to make things any better for this
test.  I still think it's a good idea, though.

> * There are *too many* mutexes in the code.

Hear, hear.

> We should drastically reduce its use. Sometimes by using better
> structures that do not require blocking at all or even introducing RCU
> and/or rwlocks. One case that I've always had doubts is dict_t. Why
> does it need locks ? Once xlator should not modify a dict_t once it
> has been passed to another xlator, and if we assume that a dict can
> only be modified by a single xlator at a time, it's very unlikely that
> it needs to modify it from multiple threads.

I think in general you're right about dicts, but I also think it would
be interesting to disable dict locking and see what breaks.  I'll bet
there's something *somewhere* that tries to access dicts concurrently.
Callbacks for children of a cluster translator using the "fan out"
pattern seem particularly suspect.  What worries me is the classic
problem with race conditions; it's easy to have something that *appears*
to work when things aren't running in parallel enough to hit tiny timing
windows, but it's a lot harder to be *sure* you're safe even when they
do.  I think I'd lean toward a more conservative approach of finding the
particularly egregious high-contention cases, examining those particular
code paths carefully, and changing them to use a lock-free dict variant
or alternative.