<div><br><div class="gmail_quote"><div dir="auto">On Mon, 6 Nov 2017 at 18:26, Nithya Balachandran <<a href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="gmail_extra"><div class="gmail_quote">On 6 November 2017 at 18:02, Atin Mukherjee <span><<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Snippet from where the test failed (the one which failed is marked in bold):<br><br># Start the volume <br>TEST $CLI_1 volume start $V0 <br> <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H1 $B1/${V0}1 <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H2 $B2/${V0}2 <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3 $B3/${V0}3 <br> <br># Bring down 2nd and 3rd glusterd <br>TEST kill_glusterd 2 <br>TEST kill_glusterd 3 <br> <br># Server quorum is not met. Brick on 1st node must be down <br><b>EXPECT_WITHIN $PROCESS_DOWN_TIMEOUT "0" brick_up_status_1 $V0 $H1 $B1/${V0}1 <br><br></b><br><pre class="m_-7820079757997362905gmail-m_-8052746117449450608gmail-console-output"><span class="m_-7820079757997362905gmail-m_-8052746117449450608gmail-timestamp"><b>08:04:05</b> </span>not ok 13 Got "" instead of "0", LINENUM:33
<span class="m_-7820079757997362905gmail-m_-8052746117449450608gmail-timestamp"><b>08:04:05</b> </span>FAILED COMMAND: 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1</pre><b></b><div><br>Nothing abnormal from the logs. The test failed as we expect the number of bricks to be up as 0 due to quorum loss but it returned "" from the command "$CLI_1 volume status $vol $host:$brick --xml | sed -ne 's/.*<status>\([01]\)<\/status>/\1/p'" . The only way this command to return a non integer number is some parsing error? As of now its a mystery to me, still looking into it.<br></div></div></blockquote><div><br></div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div>The funny thing is that it takes a very long time to send the sigterm to the brick (I'm assuming it is the local brick). It also looks like the test not check that glusterd is down before it checks the brick status.</div></div></div></div></blockquote><div dir="auto"><br></div><div dir="auto">Yes, we should check for a peer_count before checking for the brick status. But I’m not 100% sure if that’s the only issue as in that case I should have seen an integer number greater than 0 instead of blank.</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="gmail_extra"><div class="gmail_quote"><div></div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div><div style="font-size:12.8px"><br class="m_-7820079757997362905gmail-Apple-interchange-newline">[2017-11-06 08:03:21.403670]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 30 kill_glusterd 3 ++++++++++</div><div style="font-size:12.8px"><b>[2017-11-06 08:03:21.415249]</b>:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 33 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1 ++++++++++</div></div><div> </div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div>...</div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>[<b>2017-11-06 08:03:44.972007]</b> I [MSGID: 106542] [glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 30706 </div><div><br></div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div>This is nearly 25 seconds later and PROCESS_DOWN_TIMEOUT is set to 5.<br></div><div><br></div><div><br></div><div>Regards,</div><div>Nithya</div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="m_-7820079757997362905gmail-HOEnZb"><div class="m_-7820079757997362905gmail-h5"><div class="gmail_extra"><div class="gmail_quote">On Mon, Nov 6, 2017 at 3:06 PM, Nithya Balachandran <span><<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hi,<div><br></div><div>Can someone take a look at : <a href="https://build.gluster.org/job/centos6-regression/7231/" target="_blank">https://build.gluster.org/job/centos6-regression/7231/</a></div><div>?</div><div><br></div><div><br></div><div>From the logs:</div><div><br></div><div><div>[2017-11-06 08:03:21.200177]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 26 1 brick_up_status_1 patchy 127.1.1.3 /d/backends/3/patchy3 ++++++++++</div><div>[2017-11-06 08:03:21.392027]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 29 kill_glusterd 2 ++++++++++</div><div>[2017-11-06 08:03:21.400647] W [socket.c:593:__socket_rwv] 0-management: readv on <a href="http://127.1.1.2:24007" target="_blank">127.1.1.2:24007</a> failed (No data available)</div><div>The message "I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy" repeated 2 times between [2017-11-06 08:03:20.983906] and [2017-11-06 08:03:21.373432]</div><div>[2017-11-06 08:03:21.400698] I [MSGID: 106004] [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer <127.1.1.2> (<4a9ec683-6d08-47f3-960f-1ed53be2e230>), in state <Peer in Cluster>, has disconnected from glusterd.</div><div>[2017-11-06 08:03:21.400811] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9) [0x7fafe000d9b9] -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3231e) [0x7fafe002131e] -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdf37) [0x7fafe00ecf37] ) 0-management: Lock for vol patchy not held</div><div>[2017-11-06 08:03:21.400827] W [MSGID: 106118] [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock not released for patchy</div><div>[2017-11-06 08:03:21.400851] C [MSGID: 106003] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume patchy. Starting local bricks.</div><div>[2017-11-06 08:03:21.403670]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 30 kill_glusterd 3 ++++++++++</div><div>[2017-11-06 08:03:21.415249]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 33 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1 ++++++++++</div><div>[2017-11-06 08:03:31.158076] E [socket.c:2369:socket_connect_finish] 0-management: connection to <a href="http://127.1.1.2:24007" target="_blank">127.1.1.2:24007</a> failed (Connection refused); disconnecting socket</div><div>[2017-11-06 08:03:31.159513] I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy</div><div>[2017-11-06 08:03:33.151685] W [socket.c:593:__socket_rwv] 0-management: readv on <a href="http://127.1.1.3:24007" target="_blank">127.1.1.3:24007</a> failed (Connection reset by peer)</div><div>[2017-11-06 08:03:33.151735] I [MSGID: 106004] [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer <127.1.1.3> (<a9cc5688-219c-48e1-9e50-9ccb57b03631>), in state <Peer in Cluster>, has disconnected from glusterd.</div><div>[2017-11-06 08:03:33.151828] W [glusterd-locks.c:686:glusterd_mgmt_v3_unlock] (-->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9) [0x7fafe000d9b9] -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3231e) [0x7fafe002131e] -->/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdfe4) [0x7fafe00ecfe4] ) 0-management: Lock owner mismatch. Lock for vol patchy held by faa07524-55ba-46af-8359-0c6c87df5e86</div><div>[2017-11-06 08:03:33.151850] W [MSGID: 106118] [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock not released for patchy</div><div>[2017-11-06 08:03:33.151873] C [MSGID: 106002] [glusterd-server-quorum.c:357:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume patchy. Stopping local bricks.</div><div>[2017-11-06 08:03:44.972007] I [MSGID: 106542] [glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 30706</div></div><div><br></div><div>Thanks,</div><div>Nithya</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div></div></div></blockquote></div></div><div dir="ltr">-- <br></div><div class="gmail_signature" data-smartmail="gmail_signature">- Atin (atinm)</div>