[Gluster-infra] POC- Distributed regression testing framework
dkhandel at redhat.com
Mon Jun 25 12:27:35 UTC 2018
>From last few months, I've been working on bringing distributed
regression testing to production. Our regression framework takes about
4+ hours to run on a single machine. To reduce the waiting time,
Facebook contributed a distributed test runner.
The solution supports the following:
1) Shares worker pool across different testers.
2) Try failure 3 times on 3 different machines before calling it a failure.
3) Supports running ASAN, Valgrind, ASAN without leaks.
4) Store the failed test logs on a centralized server.
distributed-regression is the Jenkins job for this
There are currently a few known issues:
* Not collecting the entire logs (/var/log/glusterfs) from servers.
* A few tests fail due to infra-related issues like geo-rep tests.
* Takes ~80 minutes with 7 distributed servers (targetting 60 minutes)
* We've only tested plain regressions. ASAN and Valgrind are currently untested.
Before bringing it into production, we'll run this job nightly and
watch it for a month to debug the other failures.
Please let us know if you find any issues.
More information about the Gluster-infra