<div><br><div class="gmail_quote"><div dir="auto">On Mon, 6 Nov 2017 at 18:26, Nithya Balachandran &lt;<a href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="gmail_extra"><div class="gmail_quote">On 6 November 2017 at 18:02, Atin Mukherjee <span>&lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Snippet from where the test failed (the one which failed is marked in bold):<br><br># Start the volume                                                              <br>TEST $CLI_1 volume start $V0                                                    <br>                                                                                <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT &quot;1&quot; brick_up_status_1 $V0 $H1 $B1/${V0}1      <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT &quot;1&quot; brick_up_status_1 $V0 $H2 $B2/${V0}2      <br>EXPECT_WITHIN $PROCESS_UP_TIMEOUT &quot;1&quot; brick_up_status_1 $V0 $H3 $B3/${V0}3      <br>                                                                                <br># Bring down 2nd and 3rd glusterd                                               <br>TEST kill_glusterd 2                                                            <br>TEST kill_glusterd 3                                                            <br>                                                                                <br># Server quorum is not met. Brick on 1st node must be down                      <br><b>EXPECT_WITHIN $PROCESS_DOWN_TIMEOUT &quot;0&quot; brick_up_status_1 $V0 $H1 $B1/${V0}1 <br><br></b><br><pre class="m_-7820079757997362905gmail-m_-8052746117449450608gmail-console-output"><span class="m_-7820079757997362905gmail-m_-8052746117449450608gmail-timestamp"><b>08:04:05</b> </span>not ok 13 Got &quot;&quot; instead of &quot;0&quot;, LINENUM:33
<span class="m_-7820079757997362905gmail-m_-8052746117449450608gmail-timestamp"><b>08:04:05</b> </span>FAILED COMMAND: 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1</pre><b></b><div><br>Nothing abnormal from the logs. The test failed as we expect the number of bricks to be up as 0 due to quorum loss but it returned &quot;&quot; from the command &quot;$CLI_1 volume status $vol $host:$brick --xml | sed -ne &#39;s/.*&lt;status&gt;\([01]\)&lt;\/status&gt;/\1/p&#39;&quot; . The only way this command to return a non integer number is some parsing error? As of now its a mystery to me, still looking into it.<br></div></div></blockquote><div><br></div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div>The funny thing is that it takes a very long time to send the sigterm to the brick (I&#39;m assuming it is the local brick). It also looks like the test not check that glusterd is down before it checks the brick status.</div></div></div></div></blockquote><div dir="auto"><br></div><div dir="auto">Yes, we should check for a peer_count before checking for the brick status. But I’m not 100% sure if that’s the only issue as in that case I should have seen an integer number greater than 0 instead of blank.</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="gmail_extra"><div class="gmail_quote"><div></div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div><div style="font-size:12.8px"><br class="m_-7820079757997362905gmail-Apple-interchange-newline">[2017-11-06 08:03:21.403670]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 30 kill_glusterd 3 ++++++++++</div><div style="font-size:12.8px"><b>[2017-11-06 08:03:21.415249]</b>:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 33 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1 ++++++++++</div></div><div> </div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div>...</div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>[<b>2017-11-06 08:03:44.972007]</b> I [MSGID: 106542] [glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 30706 </div><div><br></div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div>This is nearly 25 seconds later and PROCESS_DOWN_TIMEOUT is set to 5.<br></div><div><br></div><div><br></div><div>Regards,</div><div>Nithya</div></div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="m_-7820079757997362905gmail-HOEnZb"><div class="m_-7820079757997362905gmail-h5"><div class="gmail_extra"><div class="gmail_quote">On Mon, Nov 6, 2017 at 3:06 PM, Nithya Balachandran <span>&lt;<a href="mailto:nbalacha@redhat.com" target="_blank">nbalacha@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hi,<div><br></div><div>Can someone take a look at : <a href="https://build.gluster.org/job/centos6-regression/7231/" target="_blank">https://build.gluster.org/job/centos6-regression/7231/</a></div><div>?</div><div><br></div><div><br></div><div>From the logs:</div><div><br></div><div><div>[2017-11-06 08:03:21.200177]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 26 1 brick_up_status_1 patchy 127.1.1.3 /d/backends/3/patchy3 ++++++++++</div><div>[2017-11-06 08:03:21.392027]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 29 kill_glusterd 2 ++++++++++</div><div>[2017-11-06 08:03:21.400647] W [socket.c:593:__socket_rwv] 0-management: readv on <a href="http://127.1.1.2:24007" target="_blank">127.1.1.2:24007</a> failed (No data available)</div><div>The message &quot;I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy&quot; repeated 2 times between [2017-11-06 08:03:20.983906] and [2017-11-06 08:03:21.373432]</div><div>[2017-11-06 08:03:21.400698] I [MSGID: 106004] [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer &lt;127.1.1.2&gt; (&lt;4a9ec683-6d08-47f3-960f-1ed53be2e230&gt;), in state &lt;Peer in Cluster&gt;, has disconnected from glusterd.</div><div>[2017-11-06 08:03:21.400811] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (--&gt;/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9) [0x7fafe000d9b9] --&gt;/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3231e) [0x7fafe002131e] --&gt;/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdf37) [0x7fafe00ecf37] ) 0-management: Lock for vol patchy not held</div><div>[2017-11-06 08:03:21.400827] W [MSGID: 106118] [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock not released for patchy</div><div>[2017-11-06 08:03:21.400851] C [MSGID: 106003] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume patchy. Starting local bricks.</div><div>[2017-11-06 08:03:21.403670]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 30 kill_glusterd 3 ++++++++++</div><div>[2017-11-06 08:03:21.415249]:++++++++++ G_LOG:./tests/bugs/glusterd/bug-1345727-bricks-stop-on-no-quorum-validation.t: TEST: 33 0 brick_up_status_1 patchy 127.1.1.1 /d/backends/1/patchy1 ++++++++++</div><div>[2017-11-06 08:03:31.158076] E [socket.c:2369:socket_connect_finish] 0-management: connection to <a href="http://127.1.1.2:24007" target="_blank">127.1.1.2:24007</a> failed (Connection refused); disconnecting socket</div><div>[2017-11-06 08:03:31.159513] I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy</div><div>[2017-11-06 08:03:33.151685] W [socket.c:593:__socket_rwv] 0-management: readv on <a href="http://127.1.1.3:24007" target="_blank">127.1.1.3:24007</a> failed (Connection reset by peer)</div><div>[2017-11-06 08:03:33.151735] I [MSGID: 106004] [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer &lt;127.1.1.3&gt; (&lt;a9cc5688-219c-48e1-9e50-9ccb57b03631&gt;), in state &lt;Peer in Cluster&gt;, has disconnected from glusterd.</div><div>[2017-11-06 08:03:33.151828] W [glusterd-locks.c:686:glusterd_mgmt_v3_unlock] (--&gt;/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1e9b9) [0x7fafe000d9b9] --&gt;/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3231e) [0x7fafe002131e] --&gt;/build/install/lib/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xfdfe4) [0x7fafe00ecfe4] ) 0-management: Lock owner mismatch. Lock for vol patchy held by faa07524-55ba-46af-8359-0c6c87df5e86</div><div>[2017-11-06 08:03:33.151850] W [MSGID: 106118] [glusterd-handler.c:6309:__glusterd_peer_rpc_notify] 0-management: Lock not released for patchy</div><div>[2017-11-06 08:03:33.151873] C [MSGID: 106002] [glusterd-server-quorum.c:357:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume patchy. Stopping local bricks.</div><div>[2017-11-06 08:03:44.972007] I [MSGID: 106542] [glusterd-utils.c:8063:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 30706</div></div><div><br></div><div>Thanks,</div><div>Nithya</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div></div></div></blockquote></div></div><div dir="ltr">-- <br></div><div class="gmail_signature" data-smartmail="gmail_signature">- Atin (atinm)</div>