[Bugs] [Bug 1608158] New: split brain resolution regression tests fail sporadically

bugzilla at redhat.com bugzilla at redhat.com
Wed Jul 25 04:58:47 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1608158

            Bug ID: 1608158
           Summary: split brain resolution regression tests fail
                    sporadically
           Product: GlusterFS
           Version: mainline
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: rgowdapp at redhat.com
                CC: bugs at gluster.org



Description of problem:
I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

=========================
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
-------------------
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures stat was
not served from md-cache, but instead was wound down to afr which failed stat
with EIO as the file was in split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat requests being
absorbed either by kernel attribute cache or md-cache. When its not happening
stats are reaching afr and resulting in failures of cmds like getfattr etc.
Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t: 
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj: regression
failures on afr/split-brain-resolution


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list