[Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace

Niels de Vos ndevos at redhat.com
Thu May 15 14:27:26 UTC 2014


On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote:
> On 04/30/2014 07:03 PM, Justin Clift wrote:
> >Hi us,
> >
> >Was trying out the GlusterFS regression tests in Rackspace VMs last
> >night for each of the release-3.4, release-3.5, and master branches.
> >
> >The regression test is just a run of "run-tests.sh", from a git
> >checkout of the appropriate branch.
> >
> >The good news is we're adding a lot of testing code with each release:
> >
> >  * release-3.4 -  6303 lines  (~30 mins to run test)
> >  * release-3.5 -  9776 lines  (~85 mins to run test)
> >  * master      - 11660 lines  (~90 mins to run test)
> >
> >(lines counted using:
> >  $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a)
> >
> >The bad news is the tests only "kind of" pass now.  I say kind of because
> >although the regression run *can* pass for each of these branch's, it's
> >inconsistent. :(
> >
> >Results from testing overnight:
> >
> >  * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success.
> >    * bug-857330/normal.t failed in one run
> >    * bug-887098-gmount-crash.t failed in one run
> >    * bug-857330/normal.t failed in one run
> >
> >  * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success.
> >    * bug-857330/xml.t failed in one run
> >    * bug-1004744.t failed in another run (same vm for both failures)
> >
> >  * master - 20 runs, 6 PASS, 14 FAIL. 30% success.
> >    * bug-1070734.t failed in one run
> >    * bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t failure above)
> >    * bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a subsequent run on same vm passed)
> >    * bug-1087198.t & bug-948686.t failed in one run (new vm)
> >    * bug-1070734.t & bug-1087198.t failed in one run (new vm)
> >    * bug-860663.t failed in one run
> >    * bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm)
> >    * bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm)
> >    * bug-948686.t failed in one run (new vm)
> >    * bug-1070734.t failed in one run (new vm)
> >    * bug-1023974.t failed in one run (new vm)
> >    * bug-1087198.t & bug-948686.t failed in one run (new vm)
> >    * bug-1070734.t failed in one run (new vm)
> >    * bug-1087198.t failed in one run (new vm)
> >
> >The occasional failing tests aren't completely random, suggesting
> >something is going on.  Possible race conditions maybe? (no idea).
> >
> >  * 8 failures - bug-1087198.t
> >  * 5 failures - bug-948686.t
> >  * 4 failures - bug-1070734.t
> >  * 3 failures - bug-1023974.t
> >  * 3 failures - bug-857330/normal.t
> >  * 2 failures - bug-860663.t
> >  * 2 failures - bug-1004744.t
> >  * 1 failures - bug-857330/xml.t
> >  * 1 failures - bug-887098-gmount-crash.t
> >
> >Anyone have suggestions on how to make this work reliably?
> 
> 
> 
> I think it would be a good idea to arrive at a list of test cases that
> are failing at random and assign owners to address them (default owner
> being the submitter of the test case). In addition to these, I have
> also seen tests like bd.t and xml.t fail pretty regularly.
> 
> Justin - can we publish a consolidated list of regression tests that
> fail and owners for them on an etherpad or similar?
> 
> Fixing these test cases will enable us to bring in more jenkins
> instances for parallel regression runs etc. and will also provide more
> determinism for our regression tests. Your help to address the
> regression test suite problems will be greatly appreciated!

Indeed, getting the regression tests stable seems like a blocker before 
we can move to a scalable Jenkins solution. Unfortunately, it may not be 
trivial to debug these test cases... Any suggestion on capturing useful 
data that helps in figuring out why the test cases don't pass?

Thanks,
Niels



More information about the Gluster-devel mailing list