[Bugs] [Bug 1545048] New: [brick-mux] process termination race while killing glusterfsd on last brick detach

Wed Feb 14 06:58:30 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1545048

            Bug ID: 1545048
           Summary: [brick-mux] process termination race while killing
                    glusterfsd on last brick detach
           Product: GlusterFS
           Version: mainline
         Component: core
          Assignee: bugs at gluster.org
          Reporter: mchangir at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    moagrawa at redhat.com, rgowdapp at redhat.com

Description of problem:
In brick-mux mode, during volume stop, when glusterd sends a brick-detach
message to the brick process for the last brick, the brick process responds
back to glusterd with an acknowledgment and then kills itself with a SIGTERM
signal. All this sounds fine. However, somehow, the response from the brick
doesn't reach glusterd and instead a socket disconnect notification reaches
glusterd before the response. This causes glusterd to presume that something
has gone wrong during volume stop and glusterd then bails the call and fails
the volume stop operation causing the test to fail.

This race is reproducible by running the test
tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my patch
[1]

[1] https://review.gluster.org/19308

-----

The source code for glusterfs_handle_terminate() also has the following
comment:
        /*
         * This is terribly unsafe without quiescing or shutting
         * things down properly but it gets us to the point
         * where we can test other stuff.
         *
         * TBD: finish implementing this "detach" code properly
         */

Version-Release number of selected component (if applicable):

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.