[Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

Fri Jan 8 06:29:42 UTC 2016

> On 01/07/2016 02:39 PM, Emmanuel Dreyfus wrote:
> > On Wed, Jan 06, 2016 at 05:49:04PM +0530, Ravishankar N wrote:
> >> I re triggered NetBSD regressions for
> >> http://review.gluster.org/#/c/13041/3
> >> but they are being run in silent mode and are not completing. Can some one
> >> from the infra-team take a look? The last 22 tests in
> >> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/ have
> >> failed. Highly unlikely that something is wrong with all those patches.
> > I note your latest test compelted with an error in mount-nfs-auth.t:
> > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13260/consoleFull
> >
> > Would you have the jenkins build that did not complete s that I can have a
> > look at it?
> >
> > Generally speaking, I have to pôint that NetBSD regression does show light
> > on generic bugs, we had a recent exemple with quota-nfs.t. For now there
> > are not other well supported platforms, but if you want glusterfs to
> > be really portable, removing mandatory NetBSD regression is not a good
> > idea:
> > portability bugs will crop.
> >
> > Even a daily or weekly regression run seems a bad idea to me. If you do not
> > prevent integration of patches that break NetBSD regression, that will get
> > in, and tests will break one by one over time. I have a first hand
> > experience of this situation, when I was actually trying to catch on with
> > NetBSD regression. Many time I reached something reliable enough to become
> > mandatory, and got broken by a new patch before it became actualy
> > mandatory.
> >
> > IMO, relaxing NetBSD regression requirement means the project drops the
> > goal
> > of being portable.
> >
> hi Emmanuel,
>           This Sunday I have some time I can spend helping in making
> tests better for NetBSD. I have seen bugs that are caught only by NetBSD
> regression just recently, so I see value in making NetBSD more reliable.

+1. As Manu and Ravi's conversation pointed out, its better to take a call based on data (how many tests are failing, how many are spurious). As my recent work on quota-nfs.t shows, I was actively trying to seek a reproducer for write-behind issue, but the reproducer seemed elusive. We were able to hit the bug very inconsistently. Couple that with the pressure to take things to closure, a tendency to push things under carpet creeps in.

Having said that you can find some of my commits where netbsd results are skipped (or not waited for completion of netbsd runs). A knowledge that infra is stable and there are less false-positives (of bugs) will shift responsibility on developers to own the issue and fix it.

> Please let me know what are the things we can work on. It would help if
> you give me something specific to glusterfs to make it more valuable in
> the short term. Over time I would like to learn enough to share the load
> with you however little it may be (Please bear with me, I some times go
> quiet). Here are the initial things I would like to know to begin with:

I can try to help out here too. But mostly on best effort basis as there are other responsibilities where I am evaluated directly.

> 
> 1) How to set up NetBSD VMs on my laptop which is of exact version as
> the ones that are run on build systems.
> 2) How to prevent NetBSD machines hang when things crash (At least I
> used to see that the machines hang when fuse crashes before, not sure if
> this is still the case)? (This failure needs manual intervention at the
> moment on NetBSD regressions, if we make it report failures and pick
> next job that would be the best way forward)
> 3) We should come up with a list of known problems and how to
> troubleshoot those problems, when things are not going smooth in NetBSD.
> Again, we really need to make things automatic, this should be last
> resort. Our top goal should be to make NetBSD machines report failures
> and go to execute next job.
> 4) How can we make debugging better in NetBSD? In the worst case we can
> make all tests execute in trace/debug mode on NetBSD.
> 
> I really want to appreciate the fine job you have done so far in making
> sure glusterfs is stable on NetBSD.

++1. I appreciate Emmanuel's effort/support from such a long time and will try to chip in to whatever extent I can.

> 
> Infra team,
>         I think we need to make some improvements to our infra. We need
> to get information about health of linux, NetBSD regression builds.
> 1) Something like, in the last 100 builds how many builds succeeded on
> Linux, how many succeeded on NetBSD.
> 2) What are the tests that failed in the last 100 builds and how many
> times on both Linux and NetBSD. (I actually wrote this part in some
> parts, but the whole command output has changed making my scripts stale)
> Any other ideas you guys have?
> 3) Which components have highest number of spurious failures.
> 4) How many builds did not complete/manually aborted etc.
> 
> Once we start measuring these things, next steps are to setup a process
> in place to get the health of the project stable and keep it that way.
> 
> Please let me know if anyone wants to volunteer to make things better in
> this infra part. Most of the code will be in python.
> 
> Pranith
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>