[Gluster-devel] Regression tests and improvement ideas

Wed Jun 17 19:47:19 UTC 2015

On Wed, Jun 17, 2015 at 04:26:13PM +0530, Raghavendra Talur wrote:
> Hi,
> 
> 
> MSV Bhat and I had presented in Gluster Design Summit some ideas about
> improving our testing infrastructure.
> 
> Here is the link to the slides: http://redhat.slides.com/rtalur/distaf#
> 
> Here are the same suggestions,
> 
> 1. *A .t file for a bug*
> When a community user discovers a bug in Gluster, they contact us over irc
> or email and eventually end up filling a bug in bugzilla.
> Many times it so happens that we find a bug which we don't know the
> fix for OR not a bug in our module and also end up filling a bug in
> bugzilla.
> 
> If we could rather write a .t test to reproduce the bug and add it to
> say /tests/bug/yet-to-be-fixed/ folder in gluster repo it would be
> more helpful. As part of bug-triage we could try doing the same for bugs
> filed by community users.
> 
> *What do we get?*
> 
> a. very easy for a new developer to pick up that bug and fix it.
> If .t passes then the bug is fixed.
> 
> b. The regression on daily patch sets would skip this folder; but on a
> nightly basis we could run a test on this folder to see if any of these
> tests got fixed while we were fixing some other tests. Yay!

This is surely a nice addition. When do you think something like this
could be made available?

> 2. *New gerrit/review work flow*
> 
> Our gerrit setup currently has a 2 hour average for regression run.
> Due to long queue of commits the round about time is around 4-6 hours.
> 
> Kaushal has proposed on how to reduce round about time more in this thread
> http://www.spinics.net/lists/gluster-devel/msg15798.html.

I'll try to respond to that email later :)

> 3. *Make sure tests can be done in docker and run in parallel*
> 
> To reduce time for one test run from 2 hours we can look at running
> tests in parallel. I did a prototype and got test time down to 40 mins
> on a 16 GB RAM and 4 core VM.
> 
> Current blocked at :
> Some of the tests fail in docker while they pass in a VM.
> Note that it is .t failing, Gluster works fine in docker.
> Need some help on this. More on this in a mail I will be sending later today
> at gluster-devel.

So, this parallelisation does not help us with the speed up on NetBSD
(no docker there). Because it does not help to get to a quicker
end-result, I do not see a high priority for introducing docker.

The VMs we use, have 2GB of RAM. RAM is expensive in the cloud, so we
would need to upgrade the VMs we have to be able to run multiple docker
containers. A VM with 1GB of RAM results in many spurious failures, I
dont know how much RAM we should give a VM for docker runs.

I also do not think all developers run the regression tests on their
systems, there are regular compile errors caught in the smoke and
regression tests... There is also a tendency for rebasing changes often,
even for cases where there is no need. These rebases add to the job
queue in Jenkins for little advantage. Updating a commit message to
trigger a regression, results in 2 smoke jobs, 2 regression jobs and a
number of other (rpmbbuild, bug-check, ...) jobs. Educating developers
to test before posting and only retrigger the needed jobs would help a
lot too.

My strong preference would be to split the gigantic regression test into
smaller pieces. We have already started that by placing the .t files in
their own component directories. It should be easy to setup Jenkins jobs
for each directory (or groups of dirs) and run multiple tests in
parallel.

Going different routes (docker vs VM) for different operating systems
does not sound like a good plan to me. I prefer to have things as much
as equal as possible. Additional docker tests would be cool, but I'm in
doubt about replacing the VM tests with it.

Once we have achieved parallelism for the VM tests, we could look into
having more VMs. VMs in the cloud cost money when they are running, our
Jenkins slaves are online 24x7. There is a Jenkins plugin that makes it
possible to poweron/poweroff a VM when (not) needed. This could
potentially save us a lot of money, and make it possible to use those
savings for additional VMs (that are only running when needed).

> *what do we get?*
> Running 4 docker containers on our Laptops itself can reduce time
> taken by test runs down to 90 mins. Running them on powerful machines,
> it is down to 40 mins as seen in the prototype.

If developers would run docker tests, sure, it would be a nice
improvement over the very few developers that run regressions tests for
their changes.

> 4. *Test definitions for every .t*
> 
> May be the time has come to upgrade our test infra to have tests with
> test definitions. Every .t file could have a corresponding .def file
> which is
> 	A JSON/YAML/XML config
> 	Defines the requirements of test
> 	    Type of volume
> 	    Special knowledge of brick size required?
> 	    Which repo source folders should trigger this test
> 	    Running time
> 	    Test RUN level
> 
> *what do we get?*
> a. Run a partial set of tests on a commit based on git log and test
> definitions and run complete regression as nightly.
> b. Order test run based on run times. This combined with fail on first test
> setting we have, we will fail as early as possible.
> c. Order tests based on functionality level, which means a mount.t basic
> test should run before a complex DHT test that makes use of FUSE mount.
> Again, this will help us to fail as early as possible in failure scenarios.
> d. With knowledge of type of volume required and number of bricks required,
> we can re-use volumes that are created for subsequent tests.
> Even the cleanup() function we have takes time.  DiSTAF already has a
> function equivalent to use_existing_else_create_new.

I'm not sure how well this would work with the parallel testing. But
yes, it seems like a good suggestion. Even if it forces developers to
think about creating the needed volumes for their tests. There should be
little need for a complex volume if it is only used for simple mount
testing or such.

> 5. *Testing GFAPI*
> We don't have a good test framework for gfapi as of today.
> 
> However, with the recent design proposal at https://docs.google.com/document/d/1yuRLRbdccx_0V0UDAxqWbz4g983q5inuINHgM1YO040/edit?usp=sharing

Yes, this seems like a helpful testing tool. There is still the need for
writing small .c files that test certain functions in libgfapi.
Unfortunatelt it is not trivial to include the compilation of these
tests while running the regression cases. I think we should provide an
easy to use (build)framework and example to get those done correctly.
Building a test .c file against the libgfapi version under test, with
all the correct (pkg-config) flags and paths isnt straight forward.

> 
> and
> 
> Craig Cabrey from Facebook developing a set of coreutils using
> GFAPI as mentioned here
> http://www.spinics.net/lists/gluster-devel/msg15753.html

These wont be targetting the testing of libgfapi, rather should provide
easy access to Gluster volumes for users and maybe some applications. I
think we should see Craigs tools just like Qemu, Samba and NFS-Ganesha
use-cases that should get included in automated testing in future too.

Thanks,
Niels