[Bugs] [Bug 1545048] New: [brick-mux] process termination race while killing glusterfsd on last brick detach

bugzilla at redhat.com bugzilla at redhat.com
Wed Feb 14 06:58:30 UTC 2018


            Bug ID: 1545048
           Summary: [brick-mux] process termination race while killing
                    glusterfsd on last brick detach
           Product: GlusterFS
           Version: mainline
         Component: core
          Assignee: bugs at gluster.org
          Reporter: mchangir at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    moagrawa at redhat.com, rgowdapp at redhat.com

Description of problem:
In brick-mux mode, during volume stop, when glusterd sends a brick-detach
message to the brick process for the last brick, the brick process responds
back to glusterd with an acknowledgment and then kills itself with a SIGTERM
signal. All this sounds fine. However, somehow, the response from the brick
doesn't reach glusterd and instead a socket disconnect notification reaches
glusterd before the response. This causes glusterd to presume that something
has gone wrong during volume stop and glusterd then bails the call and fails
the volume stop operation causing the test to fail.

This race is reproducible by running the test
tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my patch

[1] https://review.gluster.org/19308


The source code for glusterfs_handle_terminate() also has the following
         * This is terribly unsafe without quiescing or shutting
         * things down properly but it gets us to the point
         * where we can test other stuff.
         * TBD: finish implementing this "detach" code properly

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.

More information about the Bugs mailing list