[Bugs] [Bug 1149118] New: Spurious failure on disperse tests (bad file size on brick)
bugzilla at redhat.com
bugzilla at redhat.com
Fri Oct 3 09:14:51 UTC 2014
https://bugzilla.redhat.com/show_bug.cgi?id=1149118
Bug ID: 1149118
Summary: Spurious failure on disperse tests (bad file size on
brick)
Product: GlusterFS
Version: 3.6.0
Component: disperse
Assignee: gluster-bugs at redhat.com
Reporter: xhernandez at datalab.es
CC: bugs at gluster.org
+++ This bug was initially created as a clone of Bug #1144108 +++
Description of problem:
Sometimes, specially on NetBSD, ec test scripts fail because the size of a file
on one of the bricks has an incorrect size.
Version-Release number of selected component (if applicable): master
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
This is caused by a side effect of the read of a file though the mount point
when a brick is down. This may generate an update of the access time, leaving
the not running brick in an invalid state. This is correctly healed by
self-heal, but the script was not giving enough time to self-heal to complete.
--- Additional comment from Anand Avati on 2014-09-18 18:47:42 CEST ---
REVIEW: http://review.gluster.org/8771 (test/ec: Let self-heal repair files
before accessing bricks) posted (#1) for review on master by Xavier Hernandez
(xhernandez at datalab.es)
--- Additional comment from Anand Avati on 2014-09-30 18:46:39 CEST ---
REVIEW: http://review.gluster.org/8892 (test/ec: Fix spurious failures caused
by self-heal) posted (#1) for review on master by Xavier Hernandez
(xhernandez at datalab.es)
--- Additional comment from Anand Avati on 2014-10-01 11:27:44 CEST ---
REVIEW: http://review.gluster.org/8892 (test/ec: Fix spurious failures caused
by self-heal) posted (#2) for review on master by Xavier Hernandez
(xhernandez at datalab.es)
--- Additional comment from Anand Avati on 2014-10-03 11:01:30 CEST ---
COMMIT: http://review.gluster.org/8892 committed in master by Vijay Bellur
(vbellur at redhat.com)
------
commit a97ad9b69bb17f2351c59512fa9c6cb25d82b4da
Author: Xavier Hernandez <xhernandez at datalab.es>
Date: Thu Sep 18 18:42:34 2014 +0200
test/ec: Fix spurious failures caused by self-heal
The sha1sum of a file may update the access time of that file.
If this happens while a brick is down, as it is forced in the
test, that brick doesn't get the update, getting out of sync.
When the brick is restarted, self-heal repairs the file, but
the test shouldn't access brick contents until self-heal finishes.
If this is combined with a kill of another brick before self-heal
has finished repairing the file, the volume could become inaccessible.
Since the purpose of these tests is only to check ec functionality
(there is another test that checks self-heal), the test that corrupts
the file has been removed.
Additional checks to validate the state of the volume have been added
to avoid some timing issues.
BUG: 1144108
Change-Id: Ibd9288de519914663998a1fbc4321ec92ed6082c
Signed-off-by: Xavier Hernandez <xhernandez at datalab.es>
Reviewed-on: http://review.gluster.org/8892
Reviewed-by: Emmanuel Dreyfus <manu at netbsd.org>
Tested-by: Emmanuel Dreyfus <manu at netbsd.org>
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig at redhat.com>
Reviewed-by: Vijay Bellur <vbellur at redhat.com>
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=ctcespuxZ3&a=cc_unsubscribe
More information about the Bugs
mailing list