[Bugs] [Bug 1408101] New: Fix potential socket_poller thread deadlock and resource leak
bugzilla at redhat.com
bugzilla at redhat.com
Thu Dec 22 06:32:00 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1408101
Bug ID: 1408101
Summary: Fix potential socket_poller thread deadlock and
resource leak
Product: GlusterFS
Version: mainline
Component: rpc
Assignee: bugs at gluster.org
Reporter: kaushal at redhat.com
CC: bugs at gluster.org
The fix for bug #1404181 [1], has a potential deadlock and resource leak of the
socket_poller thread.
A disconnect caused by a PARENT_DOWN event during a fuse graph switch, can lead
to the socket_poller thread being deadlocked. The deadlock doesn't affect the
fuse client as no new fops are sent on the old graph.
In addition to the above, the race in gfapi solved by [1] can also occur in
other codepaths, and need to be solved.
Quoting Raghavendra G's comment from the review,
"""
- The race addressed by this patch (race b/w socket_disconnect cleaning up
resources in priv and socket_poller using the same and resulting in undefined
behaviour - crash/corruption etc) can potentially happen irrespective of the
codepaths socket_disconnect is invoked from (like glusterd, client_portmap_cbk,
handling of PARENT_DOWN, changelog etc). Note the usage of word "potential"
here and I am not saying that this race happens in existing code. However, I
would like this issue gets fixed for these potential cases too.
- If there are fops in progress at the time of graph switch, sending
PARENT_DOWN event on the currently active (soon to be old) graph is deferred
till all the fops are complete (though new graph becomes active and new I/O is
redirected to that graph). So, PARENT_DOWN event can be sent after processing
last response (to fop). This means PARENT_DOWN can be sent in thread executing
socket_poller itself. Since PARENT_DOWN triggers a disconnect and disconnect
waits for socket_poller to complete, we've a deadlock. Specifically the
deadlock is: socket_poller -> notify-msg-received -> fuse processes fop
response -> fuse sends PARENT_DOWN -> rpc-clnt calls rpc_clnt_disable ->
socket_disconnect -> wait till socket_poller to complete before returning from
socket_disconnect. Luckily we've have a socket_poller thread for each transport
and threads that deadlock are the threads belonging to transports from older
graphs on which no I/O happening. So, at worst this will be a case of resource
leakage (threads/sockets etc) of old graph.
"""
[1] https://review.gluster.org/16141
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list