[Gluster-devel] NetBSD Regression Failures for 2 weeks

Jeff Darcy jdarcy at redhat.com
Tue Aug 9 14:07:37 UTC 2016


> > *96* of *247* regressions failed
> 
> That is huge.

Agreed.

I think there's an experiment we should do, which I've discussed with a couple of others: redefine EXPECT_WITHIN on NetBSD to double or triple the time given, and see if it makes a difference.  Why?  Because NetBSD (or perhaps the instances we run it on) often just seems slow - particularly for things that hit the local filesystem.  For example (from recent runs):

  split-brain-favorite-child-policy.t
  Linux: 602 seconds
  NetBSD: 703 second

  nuke.t
  Linux: 79 seconds
  NetBSD: 181 seconds

  heald.t
  Linux: 145 seconds
  NetBSD: 157 seconds

Many of our tests are very timing-sensitive, running "close to the edge" in the sense of whether they'll pass consistently with the timeouts we use.  If such a test has a 90% chance of passing on Linux, it might have only a 50% chance of passing on NetBSD.  It doesn't take many tests like that before we start to see a scary number of NetBSD regression failures.

I have no doubt that some of these failures are also real, most often race conditions that fall prey to differences in Linux and NetBSD process/thread scheduling.  Nonetheless, I think it might be useful to know how many are problems in the code we ship vs. in tests that are overly optimistic with respect to how long various operations take.


More information about the Gluster-devel mailing list