[Bugs] [Bug 1545048] New: [brick-mux] process termination race while killing glusterfsd on last brick detach
bugzilla at redhat.com
bugzilla at redhat.com
Wed Feb 14 06:58:30 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1545048
Bug ID: 1545048
Summary: [brick-mux] process termination race while killing
glusterfsd on last brick detach
Product: GlusterFS
Version: mainline
Component: core
Assignee: bugs at gluster.org
Reporter: mchangir at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
moagrawa at redhat.com, rgowdapp at redhat.com
Description of problem:
In brick-mux mode, during volume stop, when glusterd sends a brick-detach
message to the brick process for the last brick, the brick process responds
back to glusterd with an acknowledgment and then kills itself with a SIGTERM
signal. All this sounds fine. However, somehow, the response from the brick
doesn't reach glusterd and instead a socket disconnect notification reaches
glusterd before the response. This causes glusterd to presume that something
has gone wrong during volume stop and glusterd then bails the call and fails
the volume stop operation causing the test to fail.
This race is reproducible by running the test
tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my patch
[1]
[1] https://review.gluster.org/19308
-----
The source code for glusterfs_handle_terminate() also has the following
comment:
/*
* This is terribly unsafe without quiescing or shutting
* things down properly but it gets us to the point
* where we can test other stuff.
*
* TBD: finish implementing this "detach" code properly
*/
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list