[Bugs] [Bug 1613807] New: Fix spurious failures in tests/basic/afr/granular-esh/ replace-brick.t
bugzilla at redhat.com
bugzilla at redhat.com
Wed Aug 8 10:29:06 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1613807
Bug ID: 1613807
Summary: Fix spurious failures in
tests/basic/afr/granular-esh/replace-brick.t
Product: GlusterFS
Version: mainline
Component: tests
Assignee: bugs at gluster.org
Reporter: pkarampu at redhat.com
CC: bugs at gluster.org
Description of problem:
Shd keeps doing heals in a loop until it heals at least one entry in the
previous run. A heal is termed successful only if it heals both metadata and
entry/data heal i.e. the entry needs to be completely healed by just that
healer. In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old
and brick-1 is new. After replace-brick only root-gfid will be present in
brick-0's index 1) shd-thread corresponding to brick-0 does metadata heal, this
creates root-gfid in brick-0's 'dirty' index. 2) Both healer threads
corresponding to brick-0 and brick-1 now try to heal root-gfid and brick-1 gets
the heal-domain lock. brick-0's shd-thread will experience a failure and it
goes back to waiting for 10 minutes (cluster.heal-timeout). When brick-1's
healer-thread completes healing root-gfid it creates 5 files which create
indices in brick-0, so until brick-0 doesn't trigger one more heal, heal won't
happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser than
cluster.heal-timeout, so decreasing this to 5 seconds so that the next heal is
triggered which will do the heals.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list