<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 2, 2018 at 11:43 AM, Xavi Hernandez <span dir="ltr">&lt;<a href="mailto:xhernandez@redhat.com" target="_blank">xhernandez@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="gmail-"><div dir="ltr">On Thu, Aug 2, 2018 at 6:14 AM Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Jul 31, 2018 at 10:11 PM Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I just went through the nightly regression report of brick mux runs and here&#39;s what I can summarize.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails only with brick-mux<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/core/bug-1432542-<wbr>mpx-restart-crash.t - Times out even after 400 secs. Refer <a href="https://fstat.gluster.org/failure/209?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/209?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a>, specifically the latest report <a href="https://build.gluster.org/job/regression-test-burn-in/4051/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4051/<wbr>consoleText</a> . Wasn&#39;t timing out as frequently as it was till 12 July. But since 27 July, it has timed out twice. Beginning to believe commit 9400b6f2c8aa219a493961e0ab9770<wbr>b7f12e80d2 has added the delay and now 400 secs isn&#39;t sufficient enough (Mohit?)<br><br>tests/bugs/glusterd/add-brick-<wbr>and-validate-replicated-<wbr>volume-options.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/814/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/814/console</a>) -  Test fails only in brick-mux mode, AI on Atin to look at and get back.<br><br>tests/bugs/replicate/bug-<wbr>1433571-undo-pending-only-on-<wbr>up-bricks.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/813/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/813/console</a>) - Seems like failed just twice in last 30 days as per <a href="https://fstat.gluster.org/failure/251?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/251?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a>. Need help from AFR team.<br><br>tests/bugs/quota/bug-1293601.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/812/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/812/console</a>) - Hasn&#39;t failed after 26 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bitrot/bug-1373520.t - (<a href="https://build.gluster.org/job/regression-test-with-multiplex/811/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/811/console</a>)  - Hasn&#39;t failed after 27 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br></div></blockquote><div><br></div><div>I see this has failed in day before yesterday&#39;s regression run as well (and I could reproduce it locally with brick mux enabled). The test fails in healing a file within a particular time period.</div><div><br></div><div><pre class="gmail-m_618998561617645212m_-6008408460366831088gmail-console-output"><span class="gmail-m_618998561617645212m_-6008408460366831088gmail-timestamp"><b>15:55:19</b> </span>not ok 25 Got &quot;0&quot; instead of &quot;512&quot;, LINENUM:55
<span class="gmail-m_618998561617645212m_-6008408460366831088gmail-timestamp"><b>15:55:19</b> </span>FAILED COMMAND: 512 path_size /d/backends/patchy5/FILE1</pre></div><div>Need EC dev&#39;s help here.<br></div></div></div></blockquote><div><br></div></span><div>I&#39;ll investigate this.</div><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br>tests/bugs/glusterd/remove-<wbr>brick-testcases.t - Failed once with a core, not sure if related to brick mux or not, so not sure if brick mux is culprit here or not. Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/806/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/806/console</a> . Seems to be a glustershd crash. Need help from AFR folks.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails for non-brick mux case too<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/distribute/bug-<wbr>1122443.t 0 Seems to be failing at my setup very often, with out brick mux as well. Refer <a href="https://build.gluster.org/job/regression-test-burn-in/4050/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4050/<wbr>consoleText</a> . There&#39;s an email in gluster-devel and a BZ 1610240 for the same. <br><br>tests/bugs/bug-1368312.t - Seems to be recent failures (<a href="https://build.gluster.org/job/regression-test-with-multiplex/815/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/815/console</a>) - seems to be a new failure, however seen this for a non-brick-mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4039/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4039/<wbr>consoleText</a> . Need some eyes from AFR folks.<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-tarssh.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/392?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/392?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-rsync.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/393?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/393?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br></div></blockquote></div></div></blockquote></span></div></div></blockquote><div><br></div><div>I have posted the patch [1] for above two. This should handle connection time outs without any logs. But I still see a strange behaviour now and then<br></div><div>where the one of the worker doesn&#39;t get started at all. I am debugging that with instrumentation patch [2]. I am not hitting that on this frequently.<br></div><div>I will continue to work on that. But patch [1] should reduce failures considerably.<br></div><div><br></div><div> [1] <a href="https://review.gluster.org/#/c/20601/">https://review.gluster.org/#/c/20601/</a></div><div>[2] <a href="https://review.gluster.org/#/c/20477/">https://review.gluster.org/#/c/20477/</a><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br>tests/bugs/glusterd/<wbr>validating-server-quorum.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/810/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/810/console</a>) - Fails for non-brick-mux cases too, <a href="https://fstat.gluster.org/failure/580?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/580?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> .  Atin has a patch <a href="https://review.gluster.org/20584" target="_blank">https://review.gluster.org/<wbr>20584</a> which resolves it but patch is failing regression for a different test which is unrelated.<br><br>tests/bugs/replicate/bug-<wbr>1586020-mark-dirty-for-entry-<wbr>txn-on-quorum-failure.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/809/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/809/console</a>) - fails for non brick mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4049/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4049/<wbr>consoleText</a> - Need some eyes from AFR folks.<br></div>
</blockquote></div></div>
</blockquote></span></div></div>
<br>______________________________<wbr>_________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/<wbr>mailman/listinfo/gluster-devel</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><div>Thanks and Regards,<br></div>Kotresh H R<br></div></div>
</div></div>