[Gluster-devel] regression failures on afr/split-brain-resolution

Raghavendra Gowdappa rgowdapp at redhat.com
Tue Jul 24 09:26:39 UTC 2018


I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45

On probing deeper, I observed a curious fact - on most of the failures stat
was not served from md-cache, but instead was wound down to afr which
failed stat with EIO as the file was in split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat requests
being absorbed either by kernel attribute cache or md-cache. When its not
happening stats are reaching afr and resulting in failures of cmds like
getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
