[Gluster-infra] NetBSD regression fixes

Niels de Vos ndevos at redhat.com
Sat Jan 16 20:49:11 UTC 2016


On Sat, Jan 16, 2016 at 06:55:49PM +0100, Emmanuel Dreyfus wrote:
> Hello all
> 
> Here are the problems identified in NetBSD regression so far:
> 
> 1) Before starting regression, slave compains about "vnconfig:
> VNDIOCGET: Bad file descriptor" and fails the run.
> 
> This will be fixed by that changes:
> http://review.gluster.org/13204
> http://review.gluster.org/13205
> 
> 
> 2) Spurious failures
> I added a retry-failed-test-once feature so that we get less regression
> failures because of spurious failures. It is not used right now because
> it does not play nicely with bad tests blacklist.
> 
> This will be fixed by that changes:
> http://review.gluster.org/13245
> http://review.gluster.org/13247
> 
> I have been looping failure-free regression for a while with that trick.

Nice, thanks for these improvements!

> 3) Stale state from previous regression
> We sometime have processes stuck from previous regression, awaiting
> vnode locks for destroyed NFS filesystems. This cause starting cleanup
> scripts to hang before starting regression and we get a timeout.
> 
> I modified slave's /opt/qa/regression.sh to check for stuck processes
> and reboot the system if we find them. That will fail the current
> regression run, but at least the next ones coming after reboot will be
> safe.
> 
> This fix is not deployed yet, I await the fixes from point 2 to be
> merged

Could you send a pull request for the regression.sh script on
https://github.com/gluster/glusterfs-patch-acceptance-tests/ ? Or, if
you dont use GitHub, send the patch by email and we'll take care of
pushing it for you.

> 4) Jenkins casts concurent runs on the same slave
> We observed Jenkins sometimes runs two jobs on the same slave at once,
> which of course can only lead to horrible failure.
> 
> I modified slave's /opt/qa/regression.sh to add a lock file so that this
> situation is detected early and reported. The second regression will
> fail, but the idea is to get a better understanding of how that can
> occur.
> 
> This fix is not deployed yet, I await the fixes from point 2 to be
> merged

Hmm, I have not seen that before, but it surely is something to be
concerned about :-/

Thanks,
Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-infra/attachments/20160116/9df975ac/attachment.sig>


More information about the Gluster-infra mailing list