[Bugs] [Bug 1512432] New: Test bug-1483058-replace-brick-quorum-validation.t fails inconsistently

bugzilla at redhat.com bugzilla at redhat.com
Mon Nov 13 09:04:20 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1512432

            Bug ID: 1512432
           Summary: Test bug-1483058-replace-brick-quorum-validation.t
                    fails inconsistently
           Product: GlusterFS
           Version: 3.12
         Component: tests
          Keywords: Triaged
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    rgowdapp at redhat.com
        Depends On: 1511310



+++ This bug was initially created as a clone of Bug #1511310 +++

Description of problem:
I ran into this failure [1] during regression runs for patch [2]. On running
the test on my local machine, it fails inconsistently. Failed test was:


TEST 15 (line 49): gluster --mode=script --wignore
--glusterd-sock=/d/backends/1/glusterd/gd.sock
--log-file=/var/log/glusterfs/bug-1483058-replace-brick-quorum-validation.t_cli1.log
volume replace-brick patchy 127.1.1.2:/d/backends/2/patchy1
127.1.1.1:/d/backends/1/patchy1_new commit force
volume replace-brick: failed: Quorum not met. Volume operation not allowed.
./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t .. 15/15
RESULT 15: 1
./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t .. Failed
1/15 subtests 

Test Summary Report
-------------------
./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t (Wstat: 0
Tests: 15 Failed: 1)
  Failed test:  15
Files=1, Tests=15, 39 wallclock secs ( 0.03 usr  0.00 sys +  1.74 cusr  0.98
csys =  2.75 CPU)

On looking at one of glusterd logs, I found:
[2017-11-09 06:06:09.387014]:++++++++++
G_LOG:./tests/bugs/glusterd/bug-1483058-replace-brick-quorum-validation.t:
TEST: 49 gluster --mode=script --wignore
--glusterd-sock=/d/backends/1/glusterd/gd.sock
--log-file=/var/log/glusterfs/bug-1483058-replace-brick-quorum-validation.t_cli1.log
volume replace-brick patchy 127.1.1.2:/d/backends/2/patchy1
127.1.1.1:/d/backends/1/patchy1_new commit force ++++++++++
The message "I [MSGID: 106487]
[glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd:
Received cli list req" repeated 5 times between [2017-11-09 06:06:03.713593]
and [2017-11-09 06:06:09.371221]
[2017-11-09 06:06:09.511510] I [MSGID: 106505]
[glusterd-replace-brick.c:67:__glusterd_handle_replace_brick] 0-management:
Received replace brick req
[2017-11-09 06:06:09.511673] I [MSGID: 106503]
[glusterd-replace-brick.c:148:__glusterd_handle_replace_brick] 0-management:
Received replace-brick commit force request.
[2017-11-09 06:06:10.205940] E [MSGID: 106001]
[glusterd-replace-brick.c:228:glusterd_op_stage_replace_brick] 0-management:
Server quorum not met. Rejecting operation.
[2017-11-09 06:06:10.205972] W [MSGID: 106122]
[glusterd-mgmt.c:168:gd_mgmt_v3_pre_validate_fn] 0-management: Replace-brick
prevalidation failed.
[2017-11-09 06:06:10.205987] E [MSGID: 106122]
[glusterd-mgmt.c:1036:glusterd_mgmt_v3_pre_validate] 0-management: Pre
Validation failed for operation Replace brick on local node
[2017-11-09 06:06:10.206000] E [MSGID: 106122]
[glusterd-replace-brick.c:660:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
0-management: Pre Validation Failed

Note that I didn't find any log related to tmp mount done during replace brick.
Also glustershd.log didn't reflect that replace brick is succeeded. It had the
old brick in graph.

Looking at this log, I fail to understand how [2] could've affected this
failure. I am running tests without [2] just to eliminate [2] as the root
cause. Will report back once tests are complete.

[1] https://build.gluster.org/job/centos6-regression/7327/console
[2] https://review.gluster.org/18681

Version-Release number of selected component (if applicable):
mainline

How reproducible:
inconsistently

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Raghavendra G on 2017-11-09 01:39 EST ---



--- Additional comment from Raghavendra G on 2017-11-09 01:43:27 EST ---

The reason why I think [2] is not the cause is, glusterd_validate_quorum()
doesn't seem to have anything related to the FOP path. If no fops (specifically
stat/fstat) are done [2] won't have any impact.

--- Additional comment from Atin Mukherjee on 2017-11-09 03:00:02 EST ---

This is indeed a bad test. The issue is the attributes to check if a peer is up
and the quorum is regained are different. 

peer_count checks for peerinfo->status which will be set to connected the
moment glusterd receives a RPC_CLNT_CONNECT event from its peer where as the
quorum check is calculated based on if peerinfo->quorum_contrib is set to
QUORUM_UP which is done at glusterd_friend_sm () and that might happen post
RPC_CLNT_CONNECT. In between these two events, if the replace brick commit
force is issued, then the same will fail with quorum rejection. I'll see how to
handle this scenario in the test and will send the patch soon.

--- Additional comment from Worker Ant on 2017-11-09 12:13:39 EST ---

REVIEW: https://review.gluster.org/18710 (tests: fix
bug-1483058-replace-brick-quorum-validation.t spurious failure) posted (#1) for
review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2017-11-12 06:28:49 EST ---

COMMIT: https://review.gluster.org/18710 committed in master by  

------------- tests: fix bug-1483058-replace-brick-quorum-validation.t spurious
failure

Change-Id: I04c35305bfb663eabbf715eee78695adfd4a2d20
BUG: 1511310
Signed-off-by: Atin Mukherjee <amukherj at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1511310
[Bug 1511310] Test bug-1483058-replace-brick-quorum-validation.t fails
inconsistently
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list