[Gluster-devel] Regression tests time

Wed Jan 24 17:37:24 UTC 2018

On Wed, Jan 24, 2018 at 3:11 PM, Jeff Darcy <jeff at pl.atyp.us> wrote:

>
>
>
> On Tue, Jan 23, 2018, at 12:58 PM, Xavi Hernandez wrote:
>
> I've made some experiments [1] with the time that centos regression takes
> to complete. After some changes the time taken to run a full regression has
> dropped between 2.5 and 3.5 hours (depending on the run time of 2 tests,
> see below).
>
> Basically the changes are related with delays manually introduced in some
> places (sleeps in test files or even in the code, or delays in timer
> events). I've changed some sleeps with better ways to detect some
> condition, and I've left the delays in other places but with reduced time.
> Probably the used values are not the best ones in all cases, but it
> highlights that we should seriously consider how we detect things instead
> of simply waiting for some amount of time (and hope it's enough). The total
> test time is more than 2 hours less with these changes, so this means that
> >2 hours of the whole regression time is spent waiting unnecessarily.
>
>
> We should definitely try to detect specific conditions instead of just
> sleeping for a fixed amount of time. That said, sometimes it would take
> significant additional effort to add a marker for a condition plus code to
> check for it. We need to be *really* careful about changing timeouts in
> these cases. It's easy to come up with something that works on one
> development system and then causes spurious failures for others.
>

That happens when we use arbitrary delays. If we use an explicit check, it
will work on all systems. Additionally, using specific checks makes it
possible to define bigger timeouts to handle corner cases because in the
normal case we'll continue as soon as the check is satisfied, which will be
almost always. But if it really fails, on that particular cases it will
take some time to detect it, which is fine because this way we allow enough
time for "normal" delays.

One of the biggest problems I had to deal with when I implemented
> multiplexing was these kinds of timing dependencies in tests, and I had to
> go through it all again when I came to Facebook. While I applaud the effort
> to reduce single-test times, I believe that parallelizing tests will
> long-term be a more effective (and definitely safer) route to reducing
> overall latency.
>

I agree that parallelizing tests is the way to go, but if we reduce the
total time to 50%, the parallelized tests will also take 50% less of the
time.

Additionally, reducing the time it takes to do each test, is a good way to
detect corner cases. If we always sleep in some cases, we could be missing
some failures that can happen if there's no sleep (and users can do the
same requests than us but without sleeping).

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180124/b3937daa/attachment.html>