[Bugs] [Bug 1360574] New: multiple failures of tests/bugs/disperse/bug-1236065.t
bugzilla at redhat.com
bugzilla at redhat.com
Wed Jul 27 05:13:53 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1360574
Bug ID: 1360574
Summary: multiple failures of tests/bugs/disperse/bug-1236065.t
Product: GlusterFS
Version: 3.8.1
Component: disperse
Keywords: Triaged
Assignee: bugs at gluster.org
Reporter: pkarampu at redhat.com
CC: aspandey at redhat.com, atalur at redhat.com,
bugs at gluster.org, ndevos at redhat.com,
pkarampu at redhat.com, xhernandez at datalab.es
Depends On: 1332054
+++ This bug was initially created as a clone of Bug #1332054 +++
Description of problem:
tests/bugs/disperse/bug-1236065.t failed several times on different Jenkins
slaves:
*
https://build.gluster.org/job/rackspace-regression-2GB-triggered/20316/console
*
https://build.gluster.org/job/rackspace-regression-2GB-triggered/20320/console
*
https://build.gluster.org/job/rackspace-regression-2GB-triggered/20321/console
Version-Release number of selected component (if applicable):
current master branch
How reproducible:
way too often
Steps to Reproduce:
1. run tests/bugs/disperse/bug-1236065.t as regression test on Jenkins
Actual results:
Sometimes test 24 fails, sometimes test 25.
13:25:28 [20:25:28] Running tests in file ./tests/bugs/disperse/bug-1236065.t
13:26:13 cp: accessing `13.o': Input/output error
13:26:13 cp: accessing `14.o': Input/output error
13:26:13 cp: accessing `15.o': Input/output error
13:26:13 cp: accessing `16.o': Input/output error
13:26:13 cp: accessing `17.o': Input/output error
13:26:14 cp: accessing `18.o': Input/output error
13:26:14 cp: accessing `19.o': Input/output error
13:26:14 cp: accessing `1.o': Input/output error
13:26:14 cp: accessing `2.o': Input/output error
13:26:15 cp: accessing `3.o': Input/output error
13:26:15 cp: accessing `4.o': Input/output error
13:26:15 cp: accessing `5.o': Input/output error
13:26:15 cp: accessing `6.o': Input/output error
13:26:15 cp: accessing `7.o': Input/output error
13:26:16 cp: accessing `8.o': Input/output error
13:26:16 cp: accessing `9.o': Input/output error
13:27:28 tar: Removing leading `/' from member names
13:27:28 ./tests/bugs/disperse/bug-1236065.t ..
13:27:28 1..41
13:27:28 ok 1, LINENUM:28
13:27:28 ok 2, LINENUM:29
13:27:28 ok 3, LINENUM:30
13:27:28 ok 4, LINENUM:31
13:27:28 ok 5, LINENUM:32
13:27:28 ok 6, LINENUM:33
13:27:28 ok 7, LINENUM:36
13:27:28 ok 8, LINENUM:39
13:27:28 ok 9, LINENUM:42
13:27:28 ok 10, LINENUM:43
13:27:28 ok 11, LINENUM:44
13:27:28 ok 12, LINENUM:46
13:27:28 ok 13, LINENUM:47
13:27:28 ok 14, LINENUM:50
13:27:28 ok 15, LINENUM:51
13:27:28 ok 16, LINENUM:54
13:27:28 ok 17, LINENUM:55
13:27:28 ok 18, LINENUM:56
13:27:28 ok 19, LINENUM:58
13:27:28 ok 20, LINENUM:59
13:27:28 ok 21, LINENUM:62
13:27:28 ok 22, LINENUM:63
13:27:28 ok 23, LINENUM:64
13:27:28 ok 24, LINENUM:66
13:27:28 not ok 25 , LINENUM:67
13:27:28 FAILED COMMAND: ec_test_make
13:27:28 ok 26, LINENUM:69
13:27:28 ok 27, LINENUM:72
13:27:28 ok 28, LINENUM:73
13:27:28 ok 29, LINENUM:76
13:27:28 ok 30, LINENUM:77
13:27:28 ok 31, LINENUM:78
13:27:28 ok 32, LINENUM:80
13:27:28 ok 33, LINENUM:81
13:27:28 ok 34, LINENUM:83
13:27:28 ok 35, LINENUM:84
13:27:28 ok 36, LINENUM:85
13:27:28 ok 37, LINENUM:86
13:27:28 ok 38, LINENUM:90
13:27:28 ok 39, LINENUM:91
13:27:28 ok 40, LINENUM:92
13:27:28 ok 41, LINENUM:93
13:27:28 Failed 1/41 subtests
13:27:28
13:27:28 Test Summary Report
13:27:28 -------------------
13:27:28 ./tests/bugs/disperse/bug-1236065.t (Wstat: 0 Tests: 41 Failed: 1)
13:27:28 Failed test: 25
13:27:28 Files=1, Tests=41, 120 wallclock secs ( 0.03 usr 0.00 sys + 5.74
cusr 2.62 csys = 8.39 CPU)
13:27:28 Result: FAIL
13:27:28 End of test ./tests/bugs/disperse/bug-1236065.t
13:27:28
================================================================================
13:27:28
13:27:28
13:27:28 Run complete
13:27:28
================================================================================
13:27:28 Number of tests found: 177
13:27:28 Number of tests selected for run based on pattern: 177
13:27:28 Number of tests skipped as they were marked bad: 7
13:27:28 Number of tests skipped because of known_issues: 1
13:27:28 Number of tests that were run: 169
13:27:28
13:27:28 1 test(s) failed
13:27:28 ./tests/bugs/disperse/bug-1236065.t
13:27:28
13:27:28 0 test(s) generated core
--- Additional comment from Niels de Vos on 2016-05-01 16:47:18 EDT ---
Adding the 'tracking' keyword so that our bug-status-check-script does not
triple over it. Please remove the keyword when progress on this bug is made.
--- Additional comment from Vijay Bellur on 2016-05-01 16:53:02 EDT ---
REVIEW: http://review.gluster.org/14138 (disperse: mark bug-1236065.t as
bad_test) posted (#1) for review on master by Niels de Vos (ndevos at redhat.com)
--- Additional comment from Xavier Hernandez on 2016-05-02 04:56:07 EDT ---
I'm unable to reproduce the problem, however logs seem to indicate that healing
operations are still running after a successful completion of test
'EXPECT_WITHIN $HEAL_TIMEOUT "0" get_pending_heal_count $V0'. Since additional
bricks are killed after this test finishes, some files might get damaged as
more that redundancy bricks will be bad, causing the I/O errors.
Most probably the root cause is that EXPECT_WITHIN uses a regular expression
and a simple "0" matches many values, for example "10". This means that if
exactly 10 files still need to be healed when the test is run, the test will
finish successfully, but self-healing won't have finished yet.
I'll post a patch to solve this problem.
--- Additional comment from Vijay Bellur on 2016-05-02 05:04:11 EDT ---
REVIEW: http://review.gluster.org/14145 (cluster/ec: Fix spurious failure of
test bug-1236065.t) posted (#1) for review on master by Xavier Hernandez
(xhernandez at datalab.es)
--- Additional comment from Vijay Bellur on 2016-05-02 07:42:53 EDT ---
COMMIT: http://review.gluster.org/14138 committed in master by Jeff Darcy
(jdarcy at redhat.com)
------
commit 70a889489d79c41edfed52fdbdfa6869869906aa
Author: Niels de Vos <ndevos at redhat.com>
Date: Sun May 1 22:49:57 2016 +0200
disperse: mark bug-1236065.t as bad_test
tests/bugs/disperse/bug-1236065.t failed several times on different
Jenkins slaves:
*
https://build.gluster.org/job/rackspace-regression-2GB-triggered/20316/console
*
https://build.gluster.org/job/rackspace-regression-2GB-triggered/20320/console
*
https://build.gluster.org/job/rackspace-regression-2GB-triggered/20321/console
BUG: 1332054
Change-Id: Ie1934f09f843c2089c187e9295288c16c01913d2
Signed-off-by: Niels de Vos <ndevos at redhat.com>
Reviewed-on: http://review.gluster.org/14138
Reviewed-by: Susant Palai <spalai at redhat.com>
Smoke: Gluster Build System <jenkins at build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur at redhat.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
--- Additional comment from Pranith Kumar K on 2016-05-02 09:09:18 EDT ---
(In reply to Xavier Hernandez from comment #3)
> I'm unable to reproduce the problem, however logs seem to indicate that
> healing operations are still running after a successful completion of test
> 'EXPECT_WITHIN $HEAL_TIMEOUT "0" get_pending_heal_count $V0'. Since
> additional bricks are killed after this test finishes, some files might get
> damaged as more that redundancy bricks will be bad, causing the I/O errors.
>
> Most probably the root cause is that EXPECT_WITHIN uses a regular expression
> and a simple "0" matches many values, for example "10". This means that if
> exactly 10 files still need to be healed when the test is run, the test will
> finish successfully, but self-healing won't have finished yet.
>
> I'll post a patch to solve this problem.
Good catch!, it could very well be this issue.
--- Additional comment from Vijay Bellur on 2016-07-22 05:27:03 EDT ---
REVIEW: http://review.gluster.org/14985 (tests: Fix pending-heal-count checks)
posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)
--- Additional comment from Vijay Bellur on 2016-07-22 13:01:50 EDT ---
COMMIT: http://review.gluster.org/14985 committed in master by Jeff Darcy
(jdarcy at redhat.com)
------
commit c5bf5d98594a4237a72cf0d3c72925d5a5aa0f69
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Fri Jul 22 13:58:22 2016 +0530
tests: Fix pending-heal-count checks
EXPECT_WITHIN takes regular expression to match the count,
so even when there are say 10 entries to heal, it would
think that the heal is complete. Fixed checking
pending heal count with correct regex.
Thanks to Xavi for finding this problem.
Change-Id: Ic593d22468b2b586bfca864962ffa0eda96b1d1f
BUG: 1332054
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
Reviewed-on: http://review.gluster.org/14985
Smoke: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez at datalab.es>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
--- Additional comment from Vijay Bellur on 2016-07-25 10:34:41 EDT ---
REVIEW: http://review.gluster.org/15006 (tests: Fix get_pending_heal_count
check in ec) posted (#1) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Vijay Bellur on 2016-07-27 00:48:45 EDT ---
REVIEW: http://review.gluster.org/15006 (tests: Fix get_pending_heal_count
check in ec) posted (#2) for review on master by Ravishankar N
(ravishankar at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1332054
[Bug 1332054] multiple failures of tests/bugs/disperse/bug-1236065.t
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list