[Gluster-devel] Regression failure: ./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t

Mon Nov 6 17:03:37 UTC 2017

On Mon, 6 Nov 2017 at 18:26, Nithya Balachandran <nbalacha at redhat.com>
wrote:

> On 6 November 2017 at 18:02, Atin Mukherjee <amukherj at redhat.com> wrote:
>
>> Snippet from where the test failed (the one which failed is marked in
>> bold):
>>
>> # Start the
>> volume
>> TEST $CLI_1 volume start
>> $V0
>>
>>
>> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H1
>> $B1/${V0}1
>> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H2
>> $B2/${V0}2
>> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3
>> $B3/${V0}3
>>
>>
>> # Bring down 2nd and 3rd
>> glusterd
>> TEST kill_glusterd
>> 2
>> TEST kill_glusterd
>> 3
>>
>>
>> # Server quorum is not met. Brick on 1st node must be
>> down
>>
>>
>> *EXPECT_WITHIN $PROCESS_DOWN_TIMEOUT "0" brick_up_status_1 $V0 $H1
>> $B1/${V0}1 *
>>
>> *08:04:05* not ok 13 Got "" instead of "0", LINENUM:33*08:04:05* FAILED COMMAND: 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1
>>
>>
>> Nothing abnormal from the logs. The test failed as we expect the number
>> of bricks to be up as 0 due to quorum loss but it returned "" from the
>> command "$CLI_1 volume status $vol $host:$brick --xml | sed -ne
>> 's/.*<status>\([01]\)<\/status>/\1/p'" . The only way this command to
>> return a non integer number is some parsing error? As of now its a mystery
>> to me, still looking into it.
>>
>
> The funny thing is that it takes a very long time to send the sigterm to
> the brick (I'm assuming it is the local brick). It also looks like the test
> not check that glusterd is down before it checks the brick status.
>

Yes, we should check for a peer_count before checking for the brick status.
But I’m not 100% sure if that’s the only issue as in that case I should
have seen an integer number greater than 0 instead of blank.

> [2017-11-06 08:03:21.403670]:++++++++++
> G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t:
> TEST: 30 kill_glusterd 3 ++++++++++
> *[2017-11-06 08:03:21.415249]*:++++++++++
> G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t:
> TEST: 33 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1
> ++++++++++
>
> ...
>
> [*2017-11-06 08:03:44.972007]* I [MSGID: 106542]
> [glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal 15
> to brick with pid 30706
>
> This is nearly 25 seconds later and PROCESS_DOWN_TIMEOUT is set to 5.
>
>
> Regards,
> Nithya
>
>
> On Mon, Nov 6, 2017 at 3:06 PM, Nithya Balachandran <nbalacha at redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Can someone take a look at :
>>> https://build.gluster.org/job/centos6-regression/7231/
>>> ?
>>>
>>>
>>> From the logs:
>>>
>>> [2017-11-06 08:03:21.200177]:++++++++++
>>> G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t:
>>> TEST: 26 1 brick_up_status_1 patchy 127.1.1.3 /d/backends/3/patchy3
>>> ++++++++++
>>> [2017-11-06 08:03:21.392027]:++++++++++
>>> G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t:
>>> TEST: 29 kill_glusterd 2 ++++++++++
>>> [2017-11-06 08:03:21.400647] W [socket.c:593:__socket_rwv] 0-management:
>>> readv on 127.1.1.2:24007 failed (No data available)
>>> The message "I [MSGID: 106499]
>>> [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management:
>>> Received status volume req for volume patchy" repeated 2 times between
>>> [2017-11-06 08:03:20.983906] and [2017-11-06 08:03:21.373432]
>>> [2017-11-06 08:03:21.400698] I [MSGID: 106004]
>>> [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer
>>> <127.1.1.2> (<4a9ec683-6d08-47f3-960f-1ed53be2e230>), in state <Peer in
>>> Cluster>, has disconnected from glusterd.
>>> [2017-11-06 08:03:21.400811] W
>>> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>> (-->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9)
>>> [0x7fafe000d9b9]
>>> -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3231e)
>>> [0x7fafe002131e]
>>> -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdf37)
>>> [0x7fafe00ecf37] ) 0-management: Lock for vol patchy not held
>>> [2017-11-06 08:03:21.400827] W [MSGID: 106118]
>>> [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock not
>>> released for patchy
>>> [2017-11-06 08:03:21.400851] C [MSGID: 106003]
>>> [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
>>> 0-management: Server quorum regained for volume patchy. Starting local
>>> bricks.
>>> [2017-11-06 08:03:21.403670]:++++++++++
>>> G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t:
>>> TEST: 30 kill_glusterd 3 ++++++++++
>>> [2017-11-06 08:03:21.415249]:++++++++++
>>> G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t:
>>> TEST: 33 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1
>>> ++++++++++
>>> [2017-11-06 08:03:31.158076] E [socket.c:2369:socket_connect_finish]
>>> 0-management: connection to 127.1.1.2:24007 failed (Connection
>>> refused); disconnecting socket
>>> [2017-11-06 08:03:31.159513] I [MSGID: 106499]
>>> [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management:
>>> Received status volume req for volume patchy
>>> [2017-11-06 08:03:33.151685] W [socket.c:593:__socket_rwv] 0-management:
>>> readv on 127.1.1.3:24007 failed (Connection reset by peer)
>>> [2017-11-06 08:03:33.151735] I [MSGID: 106004]
>>> [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer
>>> <127.1.1.3> (<a9cc5688-219c-48e1-9e50-9ccb57b03631>), in state <Peer in
>>> Cluster>, has disconnected from glusterd.
>>> [2017-11-06 08:03:33.151828] W
>>> [glusterd-locks.c:686:glusterd_mgmt_v3_unlock]
>>> (-->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9)
>>> [0x7fafe000d9b9]
>>> -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3231e)
>>> [0x7fafe002131e]
>>> -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdfe4)
>>> [0x7fafe00ecfe4] ) 0-management: Lock owner mismatch. Lock for vol patchy
>>> held by faa07524-55ba-46af-8359-0c6c87df5e86
>>> [2017-11-06 08:03:33.151850] W [MSGID: 106118]
>>> [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock not
>>> released for patchy
>>> [2017-11-06 08:03:33.151873] C [MSGID: 106002]
>>> [glusterd-server-quorum.c:357:glusterd_do_volume_quorum_action]
>>> 0-management: Server quorum lost for volume patchy. Stopping local bricks.
>>> [2017-11-06 08:03:44.972007] I [MSGID: 106542]
>>> [glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal 15
>>> to brick with pid 30706
>>>
>>> Thanks,
>>> Nithya
>>>
>>
>> --
- Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20171106/570c643a/attachment-0001.html>