<div dir="ltr">Hi,<div><br></div><div>I&#39;ve made some experiments [1] with the time that centos regression takes to complete. After some changes the time taken to run a full regression has dropped between 2.5 and 3.5 hours (depending on the run time of 2 tests, see below).</div><div><br></div><div>Basically the changes are related with delays manually introduced in some places (sleeps in test files or even in the code, or delays in timer events). I&#39;ve changed some sleeps with better ways to detect some condition, and I&#39;ve left the delays in other places but with reduced time. Probably the used values are not the best ones in all cases, but it highlights that we should seriously consider how we detect things instead of simply waiting for some amount of time (and hope it&#39;s enough). The total test time is more than 2 hours less with these changes, so this means that &gt;2 hours of the whole regression time is spent waiting unnecessarily.</div><div><br></div><div>There are still some issues that I&#39;ve been unable to solve. Probably the most critical is the time taken by a couple of tests:</div><div><ul><li>tests/bugs/nfs/bug-1053579.t<br></li><li>tests/bugs/fuse/many-groups-<wbr>for-acl.t<br></li></ul></div><div>These tests take around a minute if they work fine (~60 and ~45 seconds), but sometimes they take a lot more time (~45 and ~30 minutes) but without failing. The difference is in the time that it takes to create some system groups and users.</div><div><br></div><div>For example, one of the things the first test does it to create 200 groups. This is done in ~25 seconds on fast cases and in ~15 minutes on slow cases. This means that sometimes, creating each group takes more than 4 seconds while other times it takes around 100 milliseconds. This is &gt; x30 difference.</div><div><br></div><div>I&#39;m not sure what is the cause for this. If the slaves are connected to some external kerberos or ldap source, maybe there are some network issues (or service unavailability) at some times that cause timeouts or delays. In my local system (Fedora 27) I see high CPU usage by process sssd_be during group creation. I&#39;m not sure why or if it also happens on slaves, but it seems a good candidate. However in my system it seems to always take about 25 seconds to complete.</div><div><br></div><div>Even after the changes, tests are full of sleeps. There&#39;s one of 180 seconds (bugs/shard/parallel-<wbr>truncate-read.t). Not sure if it&#39;s really necessary, but there are many more with smaller delays between 1 and 60 seconds. Assuming that each sleep is only executed once, the total time spent in sleeps is still 15 minutes.</div><div><br></div><div>I still need to fix some tests that seem to be failing often after the changes.</div><div><br></div><div>Xavi</div><div><br></div><div>[1] <a href="https://review.gluster.org/19254" target="_blank">https://review.gluster.<wbr>org/19254</a></div></div>