<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 2, 2018 at 4:50 PM, Amar Tumballi <span dir="ltr">&lt;<a href="mailto:atumball@redhat.com" target="_blank">atumball@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Thu, Aug 2, 2018 at 4:37 PM, Kotresh Hiremath Ravishankar <span dir="ltr">&lt;<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Thu, Aug 2, 2018 at 3:49 PM, Xavi Hernandez <span dir="ltr">&lt;<a href="mailto:xhernandez@redhat.com" target="_blank">xhernandez@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span><div dir="ltr">On Thu, Aug 2, 2018 at 6:14 AM Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br></div></span><span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Jul 31, 2018 at 10:11 PM Atin Mukherjee &lt;<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I just went through the nightly regression report of brick mux runs and here&#39;s what I can summarize.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails only with brick-mux<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/core/bug-1432542-mp<wbr>x-restart-crash.t - Times out even after 400 secs. Refer <a href="https://fstat.gluster.org/failure/209?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/fail<wbr>ure/209?state=2&amp;start_date=201<wbr>8-06-30&amp;end_date=2018-07-31&amp;<wbr>branch=all</a>, specifically the latest report <a href="https://build.gluster.org/job/regression-test-burn-in/4051/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4051/c<wbr>onsoleText</a> . Wasn&#39;t timing out as frequently as it was till 12 July. But since 27 July, it has timed out twice. Beginning to believe commit 9400b6f2c8aa219a493961e0ab9770<wbr>b7f12e80d2 has added the delay and now 400 secs isn&#39;t sufficient enough (Mohit?)<br><br>tests/bugs/glusterd/add-brick-<wbr>and-validate-replicated-volume<wbr>-options.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/814/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-multiplex<wbr>/814/console</a>) -  Test fails only in brick-mux mode, AI on Atin to look at and get back.<br><br>tests/bugs/replicate/bug-14335<wbr>71-undo-pending-only-on-up-bri<wbr>cks.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/813/console" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/813/console</a>) - Seems like failed just twice in last 30 days as per <a href="https://fstat.gluster.org/failure/251?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/fail<wbr>ure/251?state=2&amp;start_date=201<wbr>8-06-30&amp;end_date=2018-07-31&amp;<wbr>branch=all</a>. Need help from AFR team.<br><br>tests/bugs/quota/bug-1293601.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/812/console" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/812/console</a>) - Hasn&#39;t failed after 26 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br><br>tests/bitrot/bug-1373520.t - (<a href="https://build.gluster.org/job/regression-test-with-multiplex/811/console" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/811/console</a>)  - Hasn&#39;t failed after 27 July and earlier it was failing regularly. Did we fix this test through any patch (Mohit?)<br></div></blockquote><div><br></div><div>I see this has failed in day before yesterday&#39;s regression run as well (and I could reproduce it locally with brick mux enabled). The test fails in healing a file within a particular time period.</div><div><br></div><div><pre class="m_-6175418078655740000m_-8504224046762247175m_5479165422519518122m_-6008408460366831088gmail-console-output"><span class="m_-6175418078655740000m_-8504224046762247175m_5479165422519518122m_-6008408460366831088gmail-timestamp"><b>15:55:19</b> </span>not ok 25 Got &quot;0&quot; instead of &quot;512&quot;, LINENUM:55
<span class="m_-6175418078655740000m_-8504224046762247175m_5479165422519518122m_-6008408460366831088gmail-timestamp"><b>15:55:19</b> </span>FAILED COMMAND: 512 path_size /d/backends/patchy5/FILE1</pre></div><div>Need EC dev&#39;s help here.<br></div></div></div></blockquote><div><br></div></span><div>I&#39;m not sure where the problem is exactly. I&#39;ve seen that when the test fails, self-heal is attempting to heal the file, but when the file is accessed, an Input/Output error is returned, aborting heal. I&#39;ve checked that a heal is attempted every time the file is accessed, but it fails always. This error seems to come from bit-rot stub xlator.</div><div><br></div><div>When in this situation, if I stop and start the volume, self-heal immediately heals the files. It seems like an stale state that is kept by the stub xlator, preventing the file from being healed.</div><div><br></div><div>Adding bit-rot maintainers for help on this one.</div></div></div></blockquote><div><br></div></span><div>Bitrot-stub marks the file as corrupted in inode_ctx. But when the file and it&#39;s hardlink are deleted from that brick and a lookup is done</div><div>on the file, it cleans up the marker on getting ENOENT. This is part of recovery steps, and only md-cache is disabled during the process.<br></div><div>Is there any other perf xlators that needs to be disabled for this scenario to expect a lookup/revalidate on the brick where</div><div>the back end file is deleted?<br></div></div></div></div></blockquote><div><br></div></span><div>Can you make sure there are no perf xlators in bitrot stack while doing it? That may not be a good idea to keep it for internal &#39;validations&#39;.</div></div></div></div></blockquote><div><br></div><div>Ok, sending the patch in sometime. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="m_-6175418078655740000m_-8504224046762247175HOEnZb"><font color="#888888"><div><br></div><div>Xavi</div></font></span><span><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br>tests/bugs/glusterd/remove-bri<wbr>ck-testcases.t - Failed once with a core, not sure if related to brick mux or not, so not sure if brick mux is culprit here or not. Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/806/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-multiplex<wbr>/806/console</a> . Seems to be a glustershd crash. Need help from AFR folks.<br><br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>Fails for non-brick mux case too<br>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>==============================<wbr>===================<br>tests/bugs/distribute/bug-1122<wbr>443.t 0 Seems to be failing at my setup very often, with out brick mux as well. Refer <a href="https://build.gluster.org/job/regression-test-burn-in/4050/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4050/c<wbr>onsoleText</a> . There&#39;s an email in gluster-devel and a BZ 1610240 for the same. <br><br>tests/bugs/bug-1368312.t - Seems to be recent failures (<a href="https://build.gluster.org/job/regression-test-with-multiplex/815/console" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/815/console</a>) - seems to be a new failure, however seen this for a non-brick-mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4039/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4039/c<wbr>onsoleText</a> . Need some eyes from AFR folks.<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-tarssh.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/392?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/fail<wbr>ure/392?state=2&amp;start_date=201<wbr>8-06-30&amp;end_date=2018-07-31&amp;<wbr>branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/00-geo-rep/georep-basic-<wbr>dr-rsync.t - this isn&#39;t specific to brick mux, have seen this failing at multiple default regression runs. Refer <a href="https://fstat.gluster.org/failure/393?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/fail<wbr>ure/393?state=2&amp;start_date=201<wbr>8-06-30&amp;end_date=2018-07-31&amp;<wbr>branch=all</a> . We need help from geo-rep dev to root cause this earlier than later<br><br>tests/bugs/glusterd/validating<wbr>-server-quorum.t (<a href="https://build.gluster.org/job/regression-test-with-multiplex/810/console" target="_blank">https://build.gluster.org/job<wbr>/regression-test-with-multiple<wbr>x/810/console</a>) - Fails for non-brick-mux cases too, <a href="https://fstat.gluster.org/failure/580?state=2&amp;start_date=2018-06-30&amp;end_date=2018-07-31&amp;branch=all" target="_blank">https://fstat.gluster.org/fail<wbr>ure/580?state=2&amp;start_date=201<wbr>8-06-30&amp;end_date=2018-07-31&amp;<wbr>branch=all</a> .  Atin has a patch <a href="https://review.gluster.org/20584" target="_blank">https://review.gluster.org/205<wbr>84</a> which resolves it but patch is failing regression for a different test which is unrelated.<br><br>tests/bugs/replicate/bug-15860<wbr>20-mark-dirty-for-entry-txn-on<wbr>-quorum-failure.t (Ref - <a href="https://build.gluster.org/job/regression-test-with-multiplex/809/console" target="_blank">https://build.gluster.org/job/<wbr>regression-test-with-multiplex<wbr>/809/console</a>) - fails for non brick mux case too - <a href="https://build.gluster.org/job/regression-test-burn-in/4049/consoleText" target="_blank">https://build.gluster.org/job/<wbr>regression-test-burn-in/4049/c<wbr>onsoleText</a> - Need some eyes from AFR folks.<br></div>
</blockquote></div></div>
</blockquote></span></div></div>
</blockquote></span></div><br><br clear="all"><span><br>-- <br><div class="m_-6175418078655740000m_-8504224046762247175gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Thanks and Regards,<br></div>Kotresh H R<br></div></div>
</span></div></div>
<br></span><span class="">______________________________<wbr>_________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">https://lists.gluster.org/mail<wbr>man/listinfo/gluster-devel</a><br></span></blockquote></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div class="m_-6175418078655740000gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Amar Tumballi (amarts)<br></div></div></div></div></div>
</font></span></div></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Thanks and Regards,<br></div>Kotresh H R<br></div></div>
</div></div>