[Gluster-devel] gluster volume stop and the regressions

Milind Changire mchangir at redhat.com
Wed Feb 14 07:00:28 UTC 2018


The volume stop, in brick-mux mode reveals a race with my patch [1]
Although this behavior is 100% reproducible with my patch, this, by no
means, implies that my patch is buggy.

In brick-mux mode, during volume stop, when glusterd sends a brick-detach
message to the brick process for the last brick, the brick process responds
back to glusterd with an acknowledgment and then kills itself with a
SIGTERM signal. All this sounds fine. However, somehow, the response
doesn't reach glusterd and instead a socket disconnect notification reaches
glusterd before the response. This causes glusterd to presume that
something has gone wrong during volume stop and glusterd then fails the
volume stop operation causing the test to fail.

This race is reproducible by running the test
tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my
patch [1]

[1] https://review.gluster.org/19308


On Thu, Feb 1, 2018 at 9:54 AM, Atin Mukherjee <amukherj at redhat.com> wrote:

> I don't think that's the right way. Ideally the test shouldn't be
> attempting to stop a volume if rebalance session is in progress. If we do
> see such a situation even with we check for rebalance status and wait till
> it finishes for 30 secs and still volume stop fails with rebalance session
> in progress error, that means either (a) rebalance session took more than
> the timeout which has been passed to EXPECT_WITHIN or (b) there's a bug in
> the code.
>
> On Thu, Feb 1, 2018 at 9:46 AM, Milind Changire <mchangir at redhat.com>
> wrote:
>
>> If a *volume stop* fails at a user's production site with a reason like
>> *rebalance session is active* then the admin will wait for the session to
>> complete and then reissue a *volume stop*;
>>
>> So, in essence, the failed volume stop is not fatal; for the regression
>> tests, I would like to propose to change a single volume stop to
>> *EXPECT_WITHIN 30* so that a if a volume cannot be stopped even after 30
>> seconds, then it could be termed fatal in the regressions scenario
>>
>> Any comments about the proposal ?
>>
>> --
>> Milind
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>


-- 
Milind
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180214/031dde10/attachment-0001.html>


More information about the Gluster-devel mailing list