[Gluster-infra] [Gluster-devel] NetBSD tests not running to completion.

Fri Jan 8 10:29:24 UTC 2016

On 01/08/2016 03:57 PM, Emmanuel Dreyfus wrote:
> On Fri, Jan 08, 2016 at 05:11:22AM -0500, Jeff Darcy wrote:
>> [08:45:57] ./tests/basic/afr/arbiter-statfs.t ..
>> [08:43:03] ./tests/basic/afr/arbiter-statfs.t ..
>> [08:40:06] ./tests/basic/afr/arbiter-statfs.t ..
>> [08:08:51] ./tests/basic/afr/arbiter-statfs.t ..
>> [08:06:44] ./tests/basic/afr/arbiter-statfs.t ..

I'm guessing that all of these are test #5.

./tests/basic/afr/arbiter-statfs.t (Wstat: 256 Tests: 5 Failed: 1)
   Failed test:  5
   Non-zero exit status: 1
   Parse errors: Bad plan.  You planned 22 tests but ran 5.

Atin had just pinged me on IRC with one such run https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13292/consoleFull
The reason is the same. The test exits early if setting up the loopback device fails:

+++ vnconfig -l
vnconfig: VNDIOCGET: Bad file descriptor
++ vnd=
++ '[' x = x ']'
++ echo 'no more vnd'
no more vnd
++ return 1

The issue was hit earlier too. http://comments.gmane.org/gmane.comp.file-systems.gluster.devel/13192
The slave had to be eventually rebooted. I don't want to add it to bad tests if test#5 failed.

>> [08:00:54] ./tests/basic/afr/self-heal.t ..
>> [07:59:56] ./tests/basic/afr/entry-self-heal.t ..
>> [18:05:23] ./tests/basic/quota-anon-fd-nfs.t ..
>> [18:06:37] ./tests/basic/quota-nfs.t ..
>> [18:49:32] ./tests/basic/quota-anon-fd-nfs.t ..
>> [18:51:46] ./tests/basic/quota-nfs.t ..
>> [14:25:37] ./tests/basic/quota-anon-fd-nfs.t ..
>> [14:26:44] ./tests/basic/quota-nfs.t ..
>> [14:45:13] ./tests/basic/tier/record-metadata-heat.t ..
> That is 6 tests, they could be disabled or ignored.
>
>> So some of us *have* done that work, in a repeatable way.  Note that the
>> list doesn't include tests which *hang* instead of failing cleanly,
>> which has recently been causing the entire NetBSD queue to get stuck
>> until someone manually stops those jobs.  What I find disturbing is the
>> idea that a feature with no consistently-available owner or identifiable
>> users can be allowed to slow or block every release unless every
>> developer devotes extra time to its maintenance.  Even if NetBSD itself
>> is worth it, I think that's an unhealthy precedent to set for the
>> project as a whole.
> For that point, we could start the regression script by:
> ( sleep 7200 && /sbin/reboot -n ) &
>
> And end it with:
> kill %1
>
> Does it seems reasonable? That way nothing can hang more than 2 hours.