[Gluster-devel] Regression failure: ./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t

Mon Nov 6 12:56:21 UTC 2017

On 6 November 2017 at 18:02, Atin Mukherjee <amukherj at redhat.com> wrote:

> Snippet from where the test failed (the one which failed is marked in
> bold):
>
> # Start the volume
>
> TEST $CLI_1 volume start $V0
>
>
>
> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H1
> $B1/${V0}1
> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H2
> $B2/${V0}2
> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3
> $B3/${V0}3
>
>
> # Bring down 2nd and 3rd glusterd
>
> TEST kill_glusterd 2
>
> TEST kill_glusterd 3
>
>
>
> # Server quorum is not met. Brick on 1st node must be
> down
>
>
> *EXPECT_WITHIN $PROCESS_DOWN_TIMEOUT "0" brick_up_status_1 $V0 $H1
> $B1/${V0}1 *
>
> *08:04:05* not ok 13 Got "" instead of "0", LINENUM:33*08:04:05* FAILED COMMAND: 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1
>
>
> Nothing abnormal from the logs. The test failed as we expect the number of
> bricks to be up as 0 due to quorum loss but it returned "" from the command
> "$CLI_1 volume status $vol $host:$brick --xml | sed -ne
> 's/.*<status>\([01]\)<\/status>/\1/p'" . The only way this command to
> return a non integer number is some parsing error? As of now its a mystery
> to me, still looking into it.
>

The funny thing is that it takes a very long time to send the sigterm to
the brick (I'm assuming it is the local brick). It also looks like the test
not check that glusterd is down before it checks the brick status.

[2017-11-06 08:03:21.403670]:++++++++++ G_LOG:./tests/bugs/glusterd/
bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 30 kill_glusterd 3
++++++++++
*[2017-11-06 08:03:21.415249]*:++++++++++ G_LOG:./tests/bugs/glusterd/
bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 33 0
brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1 ++++++++++

...

[*2017-11-06 08:03:44.972007]* I [MSGID: 106542]
[glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal 15
to brick with pid 30706

This is nearly 25 seconds later and PROCESS_DOWN_TIMEOUT is set to 5.

Regards,
Nithya

On Mon, Nov 6, 2017 at 3:06 PM, Nithya Balachandran <nbalacha at redhat.com>
> wrote:
>
>> Hi,
>>
>> Can someone take a look at : https://build.gluster.org/jo
>> b/centos6-regression/7231/
>> ?
>>
>>
>> From the logs:
>>
>> [2017-11-06 08:03:21.200177]:++++++++++ G_LOG:./tests/bugs/glusterd/bu
>> g-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 26 1
>> brick_up_status_1 patchy 127.1.1.3 /d/backends/3/patchy3 ++++++++++
>> [2017-11-06 08:03:21.392027]:++++++++++ G_LOG:./tests/bugs/glusterd/bu
>> g-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 29 kill_glusterd
>> 2 ++++++++++
>> [2017-11-06 08:03:21.400647] W [socket.c:593:__socket_rwv] 0-management:
>> readv on 127.1.1.2:24007 failed (No data available)
>> The message "I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume]
>> 0-management: Received status volume req for volume patchy" repeated 2
>> times between [2017-11-06 08:03:20.983906] and [2017-11-06 08:03:21.373432]
>> [2017-11-06 08:03:21.400698] I [MSGID: 106004]
>> [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer
>> <127.1.1.2> (<4a9ec683-6d08-47f3-960f-1ed53be2e230>), in state <Peer in
>> Cluster>, has disconnected from glusterd.
>> [2017-11-06 08:03:21.400811] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>> (-->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9)
>> [0x7fafe000d9b9] -->/build/install/lib/glusterf
>> s/3.12.2/xlator/mgmt/glusterd.so(+0x3231e) [0x7fafe002131e]
>> -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdf37)
>> [0x7fafe00ecf37] ) 0-management: Lock for vol patchy not held
>> [2017-11-06 08:03:21.400827] W [MSGID: 106118]
>> [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for patchy
>> [2017-11-06 08:03:21.400851] C [MSGID: 106003]
>> [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume patchy. Starting local
>> bricks.
>> [2017-11-06 08:03:21.403670]:++++++++++ G_LOG:./tests/bugs/glusterd/bu
>> g-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 30 kill_glusterd
>> 3 ++++++++++
>> [2017-11-06 08:03:21.415249]:++++++++++ G_LOG:./tests/bugs/glusterd/bu
>> g-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 33 0
>> brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1 ++++++++++
>> [2017-11-06 08:03:31.158076] E [socket.c:2369:socket_connect_finish]
>> 0-management: connection to 127.1.1.2:24007 failed (Connection refused);
>> disconnecting socket
>> [2017-11-06 08:03:31.159513] I [MSGID: 106499]
>> [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume patchy
>> [2017-11-06 08:03:33.151685] W [socket.c:593:__socket_rwv] 0-management:
>> readv on 127.1.1.3:24007 failed (Connection reset by peer)
>> [2017-11-06 08:03:33.151735] I [MSGID: 106004]
>> [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer
>> <127.1.1.3> (<a9cc5688-219c-48e1-9e50-9ccb57b03631>), in state <Peer in
>> Cluster>, has disconnected from glusterd.
>> [2017-11-06 08:03:33.151828] W [glusterd-locks.c:686:glusterd_mgmt_v3_unlock]
>> (-->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9)
>> [0x7fafe000d9b9] -->/build/install/lib/glusterf
>> s/3.12.2/xlator/mgmt/glusterd.so(+0x3231e) [0x7fafe002131e]
>> -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdfe4)
>> [0x7fafe00ecfe4] ) 0-management: Lock owner mismatch. Lock for vol patchy
>> held by faa07524-55ba-46af-8359-0c6c87df5e86
>> [2017-11-06 08:03:33.151850] W [MSGID: 106118]
>> [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for patchy
>> [2017-11-06 08:03:33.151873] C [MSGID: 106002]
>> [glusterd-server-quorum.c:357:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume patchy. Stopping local bricks.
>> [2017-11-06 08:03:44.972007] I [MSGID: 106542]
>> [glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal
>> 15 to brick with pid 30706
>>
>> Thanks,
>> Nithya
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20171106/2c72dc2d/attachment.html>