<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 2, 2018 at 5:05 PM, Atin Mukherjee <span dir="ltr">&lt;<a href="mailto:atin.mukherjee83@gmail.com" target="_blank">atin.mukherjee83@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><span class="gmail-"><div dir="ltr">On Thu, Aug 2, 2018 at 4:37 PM Kotresh Hiremath Ravishankar &lt;<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 2, 2018 at 3:49 PM, Xavi Hernandez <span dir="ltr">&lt;<a href="mailto:xhernandez@redhat.com" target="_blank">xhernandez@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span><div dir="ltr">On Thu, Aug 2, 2018 at 6:14 AM Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br></div></span><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Jul 31, 2018 at 10:11 PM Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I just went through the nightly regression report of brick mux runs and here&#39;s what I can summarize.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails only with brick-mux<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/core/bug-1432542-<wbr>mpx-restart-crash.t - Times out even after 400 secs. Refer <a href="https://fstat.gluster.org/failure/209?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/209?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a>, specifically the latest report <a href="https://build.gluster.org/job/regression-test-burn-in/4051/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4051/<wbr>consoleText</a> . Wasn&#39;t timing out as frequently as it was till 12 July. But since 27 July, it has timed out twice. Beginning to believe commit 9400b6f2c8aa219a493961e0ab9770<wbr>b7f12e80d2 has added the delay and now 400 secs isn&#39;t sufficient enough (Mohit?)<br><br>tests/bugs/glusterd/add-brick-<wbr>and-validate-replicated-<wbr>volume-options.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/814/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/814/console</a>) -  Test fails only in brick-mux mode, AI on Atin to look at and get back.<br><br>tests/bugs/replicate/bug-<wbr>1433571-undo-pending-only-on-<wbr>up-bricks.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/813/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/813/console</a>) - Seems like failed just twice in last 30 days as per <a href="https://fstat.gluster.org/failure/251?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/251?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a>. Need help from AFR team.<br><br>tests/bugs/quota/bug-1293601.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/812/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/812/console</a>) - Hasn&#39;t failed after 26 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bitrot/bug-1373520.t - (<a href="https://build.gluster.org/job/regression-test-with-multiplex/811/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/811/console</a>)  - Hasn&#39;t failed after 27 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br></div></blockquote><div><br></div><div>I see this has failed in day before yesterday&#39;s regression run as well (and I could reproduce it locally with brick mux enabled). The test fails in healing a file within a particular time period.</div><div><br></div><div><pre class="gmail-m_3899595027729854932m_-7168429344193105316m_5479165422519518122m_-6008408460366831088gmail-console-output"><span class="gmail-m_3899595027729854932m_-7168429344193105316m_5479165422519518122m_-6008408460366831088gmail-timestamp"><b>15:55:19</b> </span>not ok 25 Got &quot;0&quot; instead of &quot;512&quot;, LINENUM:55
<span class="gmail-m_3899595027729854932m_-7168429344193105316m_5479165422519518122m_-6008408460366831088gmail-timestamp"><b>15:55:19</b> </span>FAILED COMMAND: 512 path_size /d/backends/patchy5/FILE1</pre></div><div>Need EC dev&#39;s help here.<br></div></div></div></blockquote><div><br></div></span><div>I&#39;m not sure where the problem is exactly. I&#39;ve seen that when the test fails, self-heal is attempting to heal the file, but when the file is accessed, an Input/Output error is returned, aborting heal. I&#39;ve checked that a heal is attempted every time the file is accessed, but it fails always. This error seems to come from bit-rot stub xlator.</div><div><br></div><div>When in this situation, if I stop and start the volume, self-heal immediately heals the files. It seems like an stale state that is kept by the stub xlator, preventing the file from being healed.</div><div><br></div><div>Adding bit-rot maintainers for help on this one.</div></div></div></blockquote><div><br></div><div>Bitrot-stub marks the file as corrupted in inode_ctx. But when the file and it&#39;s hardlink are deleted from that brick and a lookup is done</div><div>on the file, it cleans up the marker on getting ENOENT. This is part of recovery steps, and only md-cache is disabled during the process.<br></div><div>Is there any other perf xlators that needs to be disabled for this scenario to expect a lookup/revalidate on the brick where</div><div>the back end file is deleted?<br></div></div></div></div></blockquote><div><br></div></span><div>But the same test doesn&#39;t fail with brick multiplexing not enabled. Do we know why?</div></div></div></blockquote><div>Don&#39;t know, something to do with perf xlators I suppose. It&#39;s not repdroduced on my local system with brick-mux enabled as well. But it&#39;s happening on Xavis&#39; system.</div><div><br></div><div>Xavi,</div><div>Could you try with the patch [1] and let me know whether it fixes the issue.<br></div><div><br></div><div>[1] <a href="https://review.gluster.org/#/c/20619/1">https://review.gluster.org/#/c/20619/1</a><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="gmail-m_3899595027729854932m_-7168429344193105316HOEnZb"><font color="#888888"><div><br></div><div>Xavi</div></font></span><span><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br>tests/bugs/glusterd/remove-<wbr>brick-testcases.t - Failed once with a core, not sure if related to brick mux or not, so not sure if brick mux is culprit here or not. Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/806/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/806/console</a> . Seems to be a glustershd crash. Need help from AFR folks.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails for non-brick mux case too<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/distribute/bug-<wbr>1122443.t 0 Seems to be failing at my setup very often, with out brick mux as well. Refer <a href="https://build.gluster.org/job/regression-test-burn-in/4050/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4050/<wbr>consoleText</a> . There&#39;s an email in gluster-devel and a BZ 1610240 for the same. <br><br>tests/bugs/bug-1368312.t - Seems to be recent failures (<a href="https://build.gluster.org/job/regression-test-with-multiplex/815/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/815/console</a>) - seems to be a new failure, however seen this for a non-brick-mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4039/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4039/<wbr>consoleText</a> . Need some eyes from AFR folks.<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-tarssh.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/392?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/392?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-rsync.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/393?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/393?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/bugs/glusterd/<wbr>validating-server-quorum.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/810/console" target="_blank">https://build.gluster.org/<wbr>job/regression-test-with-<wbr>multiplex/810/console</a>) - Fails for non-brick-mux cases too, <a href="https://fstat.gluster.org/failure/580?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/<wbr>failure/580?state=2&amp;start_<wbr>date=2018-06-30&amp;end_date=2018-<wbr>07-31&amp;branch=all</a> .  Atin has a patch <a href="https://review.gluster.org/20584" target="_blank">https://review.gluster.org/<wbr>20584</a> which resolves it but patch is failing regression for a different test which is unrelated.<br><br>tests/bugs/replicate/bug-<wbr>1586020-mark-dirty-for-entry-<wbr>txn-on-quorum-failure.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/809/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-<wbr>multiplex/809/console</a>) - fails for non brick mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4049/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4049/<wbr>consoleText</a> - Need some eyes from AFR folks.<br></div>
</blockquote></div></div>
</blockquote></span></div></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail-m_3899595027729854932m_-7168429344193105316gmail_signature"><div dir="ltr"><div>Thanks and Regards,<br></div>Kotresh H R<br></div></div>
</div></div></span><span class="gmail-">
______________________________<wbr>_________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/<wbr>mailman/listinfo/gluster-devel</a></span></blockquote></div></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><div>Thanks and Regards,<br></div>Kotresh H R<br></div></div>
</div></div>