[Gluster-devel] Multi-threaded socket code

Jeff Darcy jdarcy at redhat.com
Wed Jun 1 15:16:19 UTC 2011


This is the infamous "gatling gun" code to allow greater parallelism
within the socket transport. It's based on my SSL-transport patch
(https://github.com/gluster/glusterfs/pull/2). The two are related
because a lot of work occurs within SSL_read/SSL_write and the inherent
single-threading of the socket code via a single polling thread severely
impacts performance in that case. In any case, this is purely a proof of
concept or request for comments at this point, due to several
deficiencies that I'll get to in a moment. To expand a bit on the commit
comment...


Good: this yields a >2x performance on my tests using SSL. On the
24-core/48GB/10GbE machines in the lab, "iozone -r 1m -i 0 -l 24"
improves from 185MB/s to over 400MB/s between a single client and single
server using SSL (850MB/s without is typical) and parallel threads go
from ~2.5 to ~7.5 (even with io-threads in both cases). There might even
be some performance in non-SSL cases, e.g. a single client connecting to
many servers, but that's just icing on the cake.

Bad: the code doesn't clean up on disconnect properly, doesn't work well
with non-blocking I/O (which is rather pointless with this code anyway),
and there seems to be some bad interaction with the glusterd port
mapper. Since CloudFS doesn't use that port mapper for other reasons,
it's not affected and I'm tempted not to care, but I guess I should
debug that some day.

Ugly: the management code is very racy, and those races are tickled by
the new threading milieu that socket_poller introduces. The patch
already fixes one pre-existing race in which glusterfs_mgmt_init sets up
a callback before setting up pointers needed by code within that
callback, but there's at least one other serious problem I'm aware of.
Some of the management code (e.g. sending the commit reply for a "volume
create" request) calls socket_submit_reply and then immediately frees
some of the data being sent, so if the message isn't sent synchronously
then the other side gets an invalid message type. Sending synchronously
is the normal case, and it's unlikely that the socket buffers will fill
up on this low-traffic path so that a deferred send will be necessary,
but it is possible. I haven't gone through the inordinately convoluted
code that assembles these messages to figure out exactly where the error
lies, and frankly I'm not wild about debugging that to deal with a
problem that pre-dates my changes. While it's unlikely that the socket
buffers would fill so that a deferred send would become necessary,
especially on the low-traffic management path, it has always been
possible and this code which frees data before its sure to be sent has
always been erroneous.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Use-separate-polling-thread-for-each-connection.patch
Type: text/x-patch
Size: 16496 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20110601/e940a5a8/attachment-0003.bin>


More information about the Gluster-devel mailing list