[Bugs] [Bug 1613807] New: Fix spurious failures in tests/basic/afr/granular-esh/ replace-brick.t

Wed Aug 8 10:29:06 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1613807

            Bug ID: 1613807
           Summary: Fix spurious failures in
                    tests/basic/afr/granular-esh/replace-brick.t
           Product: GlusterFS
           Version: mainline
         Component: tests
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org

Description of problem:
Shd keeps doing heals in a loop until it heals at least one entry in the
previous run. A heal is termed successful only if it heals both metadata and
entry/data heal i.e. the entry needs to be completely healed by just that
healer. In tests/basic/afr/granular-esh/replace-brick.t test, brick-0 is old
and brick-1 is new. After replace-brick only root-gfid will be present in
brick-0's index 1) shd-thread corresponding to brick-0 does metadata heal, this
creates root-gfid in brick-0's 'dirty' index. 2) Both healer threads
corresponding to brick-0 and brick-1 now try to heal root-gfid and brick-1 gets
the heal-domain lock. brick-0's shd-thread will experience a failure and it
goes back to waiting for 10 minutes (cluster.heal-timeout). When brick-1's
healer-thread completes healing root-gfid it creates 5 files which create
indices in brick-0, so until brick-0 doesn't trigger one more heal, heal won't
happen. $HEAL_TIMEOUT is set at 120 seconds, which is lesser than
cluster.heal-timeout, so decreasing this to 5 seconds so that the next heal is
triggered which will do the heals.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.