[Gluster-devel] POC- Distributed regression testing framework

Mon Jun 25 13:58:04 UTC 2018

On Mon, Jun 25, 2018 at 7:17 PM, Deepshikha Khandelwal <dkhandel at redhat.com>
wrote:

> Hello folks,
>
> >From last few months, I've been working on bringing distributed
> regression testing to production. Our regression framework takes about
> 4+ hours to run on a single machine. To reduce the waiting time,
> Facebook contributed a distributed test runner.
>
> The solution supports the following:
>
> 1) Shares worker pool across different testers.
> 2) Try failure 3 times on 3 different machines before calling it a failure.
> 3) Supports running ASAN, Valgrind, ASAN without leaks.
> 4) Store the failed test logs on a centralized server[1].
>
> distributed-regression[2] is the Jenkins job for this
> distributed-regression testing.
>
> There are currently a few known issues:
> * Not collecting the entire logs (/var/log/glusterfs) from servers.
>

If I look at the activities involved with regression failures, this can
wait.

> * A few tests fail due to infra-related issues like geo-rep tests.
>

Please open bugs for this, so we can track them, and take it to closure.

> * Takes ~80 minutes with 7 distributed servers (targetting 60 minutes)
>

Time can change with more tests added, and also please plan to have number
of server as 1 to n.

> * We've only tested plain regressions. ASAN and Valgrind are currently
> untested.
>

Great to have it running not 'per patch', but as nightly, or weekly to
start with.

>
> Before bringing it into production, we'll run this job nightly and
> watch it for a month to debug the other failures.
>
>
I would say, bring it to production sooner, say 2 weeks, and also plan to
have the current regression as is with a special command like 'run
regression in-one-machine' in gerrit (or something similar) with voting
rights, so we can fall back to this method if something is broken in
parallel testing.

I have seen that regardless of amount of time we put some scripts in
testing, the day we move to production, some thing would be broken. So, let
that happen earlier than later, so it would help next release branching
out. Don't want to be stuck for branching due to infra failures.

Regards,
Amar

> Please let us know if you find any issues.
>
> [1] https://ci-logs.gluster.org
> [2] https://build.gluster.org/job/distributed-regression
>
> Regards,
> Deepshikha Khandelwal
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180625/541a87b6/attachment.html>