<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Mon, Jun 25, 2018 at 7:28 PM Amar Tumballi &lt;<a href="mailto:atumball@redhat.com" target="_blank">atumball@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_6754421072619632943m_-8299479228837356013HOEnZb"><div class="gmail-m_6754421072619632943m_-8299479228837356013h5">

There are currently a few known issues:<br>

* Not collecting the entire logs (/var/log/glusterfs) from servers.<br></div></div></blockquote><div><br></div><div>If I look at the activities involved with regression failures, this can wait.</div></div></div></div></blockquote><div><br></div><div>Well, we can&#39;t debug the current failures without having the logs. So this has to be fixed first.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_6754421072619632943m_-8299479228837356013HOEnZb"><div class="gmail-m_6754421072619632943m_-8299479228837356013h5">

* A few tests fail due to infra-related issues like geo-rep tests.<br></div></div></blockquote><div><br></div><div>Please open bugs for this, so we can track them, and take it to closure.</div></div></div></div></blockquote><div><br></div><div>These are failing due to infra reasons. Most likely subtle differences in the setup of these nodes vs our normal nodes. We&#39;ll only be able to debug them once we get the logs. I know the geo-rep ones are easy to fix. The playbook for setting up geo-rep correctly just didn&#39;t make it over to the playbook used for these images.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_6754421072619632943m_-8299479228837356013HOEnZb"><div class="gmail-m_6754421072619632943m_-8299479228837356013h5">

* Takes ~80 minutes with 7 distributed servers (targetting 60 minutes)<br></div></div></blockquote><div><br></div><div>Time can change with more tests added, and also please plan to have number of server as 1 to n.</div></div></div></div></blockquote><div><br></div><div>While the n is configurable, however it will be fixed to a single digit number for now. We will need to place *some* limitation somewhere or else we&#39;ll end up not being able to control our cloud bills.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_6754421072619632943m_-8299479228837356013HOEnZb"><div class="gmail-m_6754421072619632943m_-8299479228837356013h5">

* We&#39;ve only tested plain regressions. ASAN and Valgrind are currently untested.<br></div></div></blockquote><div><br></div><div>Great to have it running not &#39;per patch&#39;, but as nightly, or weekly to start with. </div></div></div></div></blockquote><div><br></div><div>This is currently not targeted until we phase out current regressions.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_6754421072619632943m_-8299479228837356013HOEnZb"><div class="gmail-m_6754421072619632943m_-8299479228837356013h5">

<br>

Before bringing it into production, we&#39;ll run this job nightly and<br>

watch it for a month to debug the other failures.<br>

<br></div></div></blockquote><div><br></div><div>I would say, bring it to production sooner, say 2 weeks, and also plan to have the current regression as is with a special command like &#39;run regression in-one-machine&#39; in gerrit (or something similar) with voting rights, so we can fall back to this method if something is broken in parallel testing.</div><div><br></div><div>I have seen that regardless of amount of time we put some scripts in testing, the day we move to production, some thing would be broken. So, let that happen earlier than later, so it would help next release branching out. Don&#39;t want to be stuck for branching due to infra failures.</div></div></div></div></blockquote></div><div><br></div><div>Having two regression jobs that can vote is going to cause more confusion than it&#39;s worth. There are a couple of intermittent memory issues with the test script that we need to debug and fix before I&#39;m comfortable in making this job a voting job. We&#39;ve worked around these problems right now. It still pops up now and again. The fact that things break often is not an excuse to prevent avoidable failures.  The one month timeline was taken with all these factors into consideration. The 2-week timeline is a no-go at this point.</div><div><br></div><div>When we are ready to make the switch, we won&#39;t be switching 100% of the job. We&#39;ll start with a sliding scale so that we can monitor failures and machine creation adequately.<br></div><div><br></div>-- <br><div dir="ltr" class="gmail-m_6754421072619632943gmail_signature"><div dir="ltr">nigelb<br></div></div></div>