[Gluster-devel] Spurious regression failure analysis from runs over the wkend

Justin Clift justin at gluster.org
Mon Feb 23 18:58:32 UTC 2015


Short version:

75% of the Jenkins regression tests we run in Rackspace (on
glusterfs master branch) fail from spurious errors.

This is why we're having capacity problems with our Jenkins
slave nodes... we need to run our tests 4x for each CR just
to get a potentially valid result. :/


Longer version:

Ran some regression test runs (20) on git master head over the
weekend, to better understand our spurious failure situation.

75% of the regression runs failed in various ways.  Oops. ;)

The failures:

  * 5 x tests/bugs/fuse/bug-1126048.t
        Failed test:  10

  * 3 x tests/bugs/quota/bug-1087198.t
        Failed test:  18

  * 3 x tests/performance/open-behind.t
        Failed test:  17

  * 2 x tests/bugs/geo-replication/bug-877293.t
        Failed test:  11

  * 2 x tests/basic/afr/split-brain-heal-info.t
        Failed tests:  20-41

  * 1 x tests/bugs/distribute/bug-1117851.t
        Failed test:  15

  * 1 x tests/basic/uss.t
        Failed test:  26

  * 1 x hung on tests/bugs/posix/bug-1113960.t

        No idea which test it was on.  Left it running
        several hours, then killed the VM along with the rest.

4 of the regression runs also created coredumps.  Uploaded the
archived_builds and logs here:

    http://mirror.salasaga.org/gluster/

(are those useful?)

We should probably concentrate on fixing the most common
spurious failures soon, and look into the less common ones
later on.

I'll do some runs on release-3.6 soon too, as I suspect that'll
be useful.

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



More information about the Gluster-devel mailing list