[Gluster-devel] NetBSD regression spurious failures

Emmanuel Dreyfus manu at netbsd.org
Mon Jan 12 12:54:10 UTC 2015


Hello

NetBSD regression tests have been running for a while with voting
disabled. Results can be seen there:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/

Unfortunately it still has a few spurious failures. Help would be
welcome to fix them. Here is the worst offenders:


1) tests/basic/afr/self-heald.t test 29

This test checks that files undergoing I/O are not reported as being
healed. It often fails on NetBSD with glfs indeed reporting the files as
"Possibly undergoing heal"

A previous change set modified the test to ignore such situation, but it
seems it is inappropriate: http://review.gluster.org/9074 (NB: the part
that modify glfs-heal.c is now obsolete as it has been merged by another
change).

As I understand, tailure to acquire lock on the file on bricks is enough
to consider it as "Possibly undergoing heal", and that means it is easy
to get false positive. The question is why NetBSD would fail to lock
more often than Linux. I wonder if lack of asynchronous I/O support
could not be an explanation.


2) tests/basic/afr/self-heald.t test 67

#METADATA
TEST $CLI volume set $V0 cluster.metadata-self-heal off
EXPECT "off" volume_option $V0 cluster.metadata-self-heal
kill_multiple_bricks $V0 $H0 $B0
TEST chmod 777 $M0/f
EXPECT 1 afr_get_pending_heal_count $V0

Further testing shows that after kill_multiple_bricks $V0 $H0 $B0
glfs-heal starts reporting this kind of thing for all bricks
that were not killed:

Brick nbslave70.cloud.gluster.org:/d/backends/patchy1/
/a/a/a/a/a/a/a/a
/a/a/a/a/a/a/a
/a/a/a/a/a/a
/a/a/a/a/a
/a/a/a/a
/a/a/a
/a/a
/a
/
Number of entries: 9

3) tests/basic/ec/ec-12-4.t 
More rare failure, Xavier started looking at it. Here is an example:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/581/
console

4) tests/basic/afr/data-self-heal.t test 85
Rare, not investigated yet. Example:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/584/
console

5) tests/encryption/crypt.t test 19
Rare, not investigated yet, Example:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/560/
console

NB: tests/basic/quota.t test 24 is supposed to be fixed now.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org


More information about the Gluster-devel mailing list