<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 3, 2018 at 4:01 PM, Kotresh Hiremath Ravishankar <span dir="ltr"><<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div>Hi Du/Poornima,<br><br></div>I was analysing bitrot and geo-rep failures and I suspect there is a bug in some perf xlator<br></div>that was one of the cause. I was seeing following behaviour in few runs.<br><br></div>1. Geo-rep synced data to slave. It creats empty file and then rsync syncs data.<br></div> But test does "stat --format "%F" <file>" to confirm. If it's empty, it returns<br></div> "regular empty file" else "regular file". I believe it did get the "regular empty file"</div><div> instead of "regular file" until timeout.<br></div></div></div></div></div></div></div></blockquote><div><br></div><div><a href="https://review.gluster.org/20549">https://review.gluster.org/20549</a> might be relevant.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div> <br></div>2. Other behaviour is with bitrot, with brick-mux. If a file is deleted on the back end on one brick<br></div> and the look up is done. What all performance xlators needs to be disabled to get the lookup/revalidate<br></div> on the brick where the file was deleted. Earlier, only md-cache was disable and it used to work.<br></div> No it's failing intermittently.</div></div></div></blockquote><div><br></div><div>You need to disable readdirplus in the entire stack. Refer to <a href="https://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html">https://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html</a></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><br></div><div>Are there any pending patches around these areas that needs to be merged ?</div><div>If there are, then it could be affecting other tests as well.</div><div><br></div>Thanks,<br></div>Kotresh HR<br></div><div class="gmail_extra"><div><div class="gmail-h5"><br><div class="gmail_quote">On Fri, Aug 3, 2018 at 3:07 PM, Karthik Subrahmanya <span dir="ltr"><<a href="mailto:ksubrahm@redhat.com" target="_blank">ksubrahm@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div><div class="gmail-m_6640170070907462070h5"><div dir="ltr">On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya <<a href="mailto:ksubrahm@redhat.com" target="_blank">ksubrahm@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya <<a href="mailto:ksubrahm@redhat.com" target="_blank">ksubrahm@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr">On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, <<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I just went through the nightly regression report of brick mux runs and here's what I can summarize.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails only with brick-mux<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/core/bug-1432542-mp<wbr>x-restart-crash.t - Times out even after 400 secs. Refer <a href="https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/fail<wbr>ure/209?state=2&start_date=<wbr>2018-06-30&end_date=2018-07-<wbr>31&branch=all</a>, specifically the latest report <a href="https://build.gluster.org/job/regression-test-burn-in/4051/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4051/c<wbr>onsoleText</a> . Wasn't timing out as frequently as it was till 12 July. But since 27 July, it has timed out twice. Beginning to believe commit 9400b6f2c8aa219a493961e0ab9770<wbr>b7f12e80d2 has added the delay and now 400 secs isn't sufficient enough (Mohit?)<br><br>tests/bugs/glusterd/add-brick-<wbr>and-validate-replicated-volume<wbr>-options.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/814/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-multiplex<wbr>/814/console</a>) - Test fails only in brick-mux mode, AI on Atin to look at and get back.<br><br>tests/bugs/replicate/bug-14335<wbr>71-undo-pending-only-on-up-<wbr>bricks.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/813/console" rel="noreferrer" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/813/console</a>) - Seems like failed just twice in last 30 days as per <a href="https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/fail<wbr>ure/251?state=2&start_date=<wbr>2018-06-30&end_date=2018-07-<wbr>31&branch=all</a>. Need help from AFR team.<br><br>tests/bugs/quota/bug-1293601.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/812/console" rel="noreferrer" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/812/console</a>) - Hasn't failed after 26 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bitrot/bug-1373520.t - (<a href="https://build.gluster.org/job/regression-test-with-multiplex/811/console" rel="noreferrer" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/811/console</a>) - Hasn't failed after 27 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bugs/glusterd/remove-bri<wbr>ck-testcases.t - Failed once with a core, not sure if related to brick mux or not, so not sure if brick mux is culprit here or not. Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/806/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-multiplex<wbr>/806/console</a> . Seems to be a glustershd crash. Need help from AFR folks.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails for non-brick mux case too<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/distribute/bug-1122<wbr>443.t 0 Seems to be failing at my setup very often, with out brick mux as well. Refer <a href="https://build.gluster.org/job/regression-test-burn-in/4050/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4050/c<wbr>onsoleText</a> . There's an email in gluster-devel and a BZ 1610240 for the same. <br><br>tests/bugs/bug-1368312.t - Seems to be recent failures (<a href="https://build.gluster.org/job/regression-test-with-multiplex/815/console" rel="noreferrer" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/815/console</a>) - seems to be a new failure, however seen this for a non-brick-mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4039/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4039/c<wbr>onsoleText</a> . Need some eyes from AFR folks.<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-tarssh.t - this isn't specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/fail<wbr>ure/392?state=2&start_date=<wbr>2018-06-30&end_date=2018-07-<wbr>31&branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-rsync.t - this isn't specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/fail<wbr>ure/393?state=2&start_date=<wbr>2018-06-30&end_date=2018-07-<wbr>31&branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/bugs/glusterd/validating<wbr>-server-quorum.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/810/console" rel="noreferrer" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/810/console</a>) - Fails for non-brick-mux cases too, <a href="https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all" rel="noreferrer" target="_blank">https://fstat.gluster.org/fail<wbr>ure/580?state=2&start_date=<wbr>2018-06-30&end_date=2018-07-<wbr>31&branch=all</a> . Atin has a patch <a href="https://review.gluster.org/20584" rel="noreferrer" target="_blank">https://review.gluster.org/205<wbr>84</a> which resolves it but patch is failing regression for a different test which is unrelated.<br><br>tests/bugs/replicate/bug-15860<wbr>20-mark-dirty-for-entry-txn-<wbr>on-quorum-failure.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/809/console" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-multiplex<wbr>/809/console</a>) - fails for non brick mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4049/consoleText" rel="noreferrer" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4049/c<wbr>onsoleText</a> - Need some eyes from AFR folks.<br></div></blockquote></div></div><div dir="auto">I am looking at this. It is not reproducible locally. Trying to do this on soft serve.</div></div></blockquote><div><br></div><div>In soft serve machine also it is not failing where the regression has failed. But I found some other problem in the script.</div><div>Will fix that and add some extra logs so that it should be easier to debug when it fails next time.</div></div></div></blockquote><div> </div></div></div><div>RCA for <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">tests/bugs/replicate/bug-</span><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">15860<wbr>20-mark-dirty-for-entry-</span><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">txn-on<wbr>-quorum-failure.t failure</span>:</div><div>This test case basically completely fills 2 out of 3 bricks and provisions one brick with some extra space so that entry creation succeeds only on one brick and fails on other bricks.</div><div>Since 2 of the bricks gets filled, only the entry creation succeeds on those bricks but the creation of gfid hard link inside the ".glusterfs" fails. This is a bug in "posix" code with entry transactions.</div><div>If the gfid link creation fails we are just logging an error message and continuing. Since we depend on that gfid, the entry should be deleted if this fails.</div><div>When the shd tries to heal those files it sees that the gfid link is not present for those files and it fails to heal.</div><div><br></div><div>I will send a fix for this, which deletes the entry if it fails to create the link inside .glusterfs.</div><div><br></div><div>Regards,</div><div>Karthik</div><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="auto"><br></div><div dir="auto">Regards,</div><div dir="auto">Karthik</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"></div>
______________________________<wbr>_________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" rel="noreferrer" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mail<wbr>man/listinfo/gluster-devel</a></blockquote></div></div></div>
</blockquote></div></div>
</blockquote></span></div></div>
<br>______________________________<wbr>_________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/mail<wbr>man/listinfo/gluster-devel</a><br></blockquote></div><br><br clear="all"><br></div></div><span class="gmail-">-- <br><div class="gmail-m_6640170070907462070gmail_signature"><div dir="ltr"><div>Thanks and Regards,<br></div>Kotresh H R<br></div></div>
</span></div>
</blockquote></div><br></div></div>