[Bugs] [Bug 1613807] Fix spurious failures in tests/basic/afr/granular-esh/ replace-brick.t

Thu Aug 9 11:32:41 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1613807

Worker Ant <bugzilla-bot at gluster.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |MODIFIED

--- Comment #2 from Worker Ant <bugzilla-bot at gluster.org> ---
COMMIT: https://review.gluster.org/20681 committed in master by "Atin
Mukherjee" <amukherj at redhat.com> with a commit message- tests: Set heal-timeout
to 5 seconds

Shd keeps doing heals in a loop until it heals at least one entry in the
previous run. A heal is termed successful only if it heals both metadata and
entry/data heal i.e. the entry needs to be completely healed by just that
healer.

In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old and
brick-1
is new. After replace-brick only root-gfid will be present in brick-0's index
1) shd-thread corresponding to brick-0 does metadata heal, this creates
root-gfid in brick-0's 'dirty' index.
2) Both healer threads corresponding to brick-0 and brick-1 now try to heal
root-gfid and brick-1 gets the heal-domain lock. brick-0's shd-thread will
experience a failure and it goes back to waiting for 10 minutes
(cluster.heal-timeout).
3) When brick-1's healer-thread completes healing root-gfid it creates 5 files
which create indices in brick-0, so until brick-0 doesn't trigger one more
heal, heal won't happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser
than cluster.heal-timeout, so decreasing this to 5 seconds so that the next
heal is triggered which will do the heals.

fixes bz#1613807
Change-Id: I881133fc28880d8615fbc4558a0dfa0dc63d7798
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.