[Gluster-devel] regression failures on afr/split-brain-resolution

Raghavendra Gowdappa rgowdapp at redhat.com
Tue Jul 24 15:15:15 UTC 2018


On Tue, Jul 24, 2018 at 8:36 PM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:

>
>
> On Tue, Jul 24, 2018 at 8:35 PM, Raghavendra Gowdappa <rgowdapp at redhat.com
> > wrote:
>
>>
>>
>> On Tue, Jul 24, 2018 at 6:30 PM, Ravishankar N <ravishankar at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>>>
>>> All,
>>>
>>> I was trying to debug regression failures on [1] and observed that
>>> split-brain-resolution.t was failing consistently.
>>>
>>> =========================
>>> TEST 45 (line 88): 0 get_pending_heal_count patchy
>>> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
>>> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>>>
>>> Test Summary Report
>>> -------------------
>>> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed:
>>> 17)
>>>   Failed tests:  24-26, 28-36, 41-45
>>>
>>>
>>> On probing deeper, I observed a curious fact - on most of the failures
>>> stat was not served from md-cache, but instead was wound down to afr which
>>> failed stat with EIO as the file was in split brain. So, I did another test:
>>> * disabled md-cache
>>> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>>>
>>> Now the test fails always. So, I think the test relied on stat requests
>>> being absorbed either by kernel attribute cache or md-cache. When its not
>>> happening stats are reaching afr and resulting in failures of cmds like
>>> getfattr etc.
>>>
>>>
>>> This indeed seems to be the case.  Is there any way we can avoid the
>>> stat? When a getfattr is performed on the mount, aren't lookup + getfattr
>>> are the only fops that need to be hit in gluster?
>>>
>>
>> Its a black box to me how kernel decides whether to do lookup or stat.
>> But I guess, if only stat is needed and its not available in cache it would
>> do a stat.
>>
>
> Another thing you can do is mounting with a higher value of
> attribute-timeout. Let us know whether it works.
>

I tried higher values of attribute-timeout and its not helping. Are there
any other similar split brain related tests? Can I mark these tests bad for
time being as  the md-cache patch has a deadline?


>
>> -Ravi
>>>
>>> Thoughts?
>>>
>>> [1] https://review.gluster.org/#/c/20549/
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing listGluster-devel at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180724/f01fa367/attachment.html>


More information about the Gluster-devel mailing list