<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya &lt;<a href="mailto:ksubrahm@redhat.com">ksubrahm@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya &lt;<a href="mailto:ksubrahm@redhat.com" target="_blank">ksubrahm@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr">On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I just went through the nightly regression report of brick mux runs and here&#39;s what I can summarize.<br><br>=========================================================================================================================================================================<br>Fails only with brick-mux<br>=========================================================================================================================================================================<br>tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after 400 secs. Refer <a href="https://fstat.gluster.org/failure/209?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/failure/209?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all</a>, specifically the latest report <a href="https://build.gluster.org/job/regression-test-burn-in/4051/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-burn-in/4051/consoleText</a> . Wasn&#39;t timing out as frequently as it was till 12 July. But since 27 July, it has timed out twice. Beginning to believe commit 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400 secs isn&#39;t sufficient enough (Mohit?)<br><br>tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/814/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/814/console</a>) -  Test fails only in brick-mux mode, AI on Atin to look at and get back.<br><br>tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/813/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/813/console</a>) - Seems like failed just twice in last 30 days as per <a href="https://fstat.gluster.org/failure/251?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/failure/251?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all</a>. Need help from AFR team.<br><br>tests/bugs/quota/bug-1293601.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/812/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/812/console</a>) - Hasn&#39;t failed after 26 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bitrot/bug-1373520.t - (<a href="https://build.gluster.org/job/regression-test-with-multiplex/811/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/811/console</a>)  - Hasn&#39;t failed after 27 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core, not sure if related to brick mux or not, so not sure if brick mux is culprit here or not. Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/806/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/806/console</a> . Seems to be a glustershd crash. Need help from AFR folks.<br><br>=========================================================================================================================================================================<br>Fails for non-brick mux case too<br>=========================================================================================================================================================================<br>tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup very often, with out brick mux as well. Refer <a href="https://build.gluster.org/job/regression-test-burn-in/4050/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-burn-in/4050/consoleText</a> . There&#39;s an email in gluster-devel and a BZ 1610240 for the same. <br><br>tests/bugs/bug-1368312.t - Seems to be recent failures (<a href="https://build.gluster.org/job/regression-test-with-multiplex/815/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/815/console</a>) - seems to be a new failure, however seen this for a non-brick-mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4039/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-burn-in/4039/consoleText</a> . Need some eyes from AFR folks.<br><br>tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/392?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/failure/392?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/00-geo-rep/georep-basic-dr-rsync.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/393?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/failure/393?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/bugs/glusterd/validating-server-quorum.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/810/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/810/console</a>) - Fails for non-brick-mux cases too, <a href="https://fstat.gluster.org/failure/580?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/failure/580?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all</a> .  Atin has a patch <a href="https://review.gluster.org/20584" rel="noreferrer" target="_blank">https://review.gluster.org/20584</a> which resolves it but patch is failing regression for a different test which is unrelated.<br><br>tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/809/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-with-multiplex/809/console</a>) - fails for non brick mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4049/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/regression-test-burn-in/4049/consoleText</a> - Need some eyes from AFR folks.<br></div></blockquote></div></div><div dir="auto">I am looking at this. It is not reproducible locally. Trying to do this on soft serve.</div></div></blockquote><div><br></div><div>In soft serve machine also it is not failing where the regression has failed. But I found some other problem in the script.</div><div>Will fix that and add some extra logs so that it should be easier to debug when it fails next time.</div></div></div></blockquote><div> </div><div>RCA for <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">tests/bugs/replicate/bug-</span><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">1586020-mark-dirty-for-entry-</span><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">txn-on-quorum-failure.t failure</span>:</div><div>This test case basically completely fills 2 out of 3 bricks and provisions one brick with some extra space so that entry creation succeeds only on one brick and fails on other bricks.</div><div>Since 2 of the bricks gets filled, only the entry creation succeeds on those bricks but the creation of gfid hard link inside the &quot;.glusterfs&quot; fails. This is a bug in &quot;posix&quot; code with entry transactions.</div><div>If the gfid link creation fails we are just logging an error message and continuing. Since we depend on that gfid, the entry should be deleted if this fails.</div><div>When the shd tries to heal those files it sees that the gfid link is not present for those files and it fails to heal.</div><div><br></div><div>I will send a fix for this, which deletes the entry if it fails to create the link inside .glusterfs.</div><div><br></div><div>Regards,</div><div>Karthik</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto"><div dir="auto"><br></div><div dir="auto">Regards,</div><div dir="auto">Karthik</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"></div>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" rel="noreferrer" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-devel</a></blockquote></div></div></div>
</blockquote></div></div>
</blockquote></div></div>