[Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

Kotresh Hiremath Ravishankar khiremat at redhat.com
Thu Aug 2 11:42:02 UTC 2018


On Thu, Aug 2, 2018 at 5:05 PM, Atin Mukherjee <atin.mukherjee83 at gmail.com>
wrote:

>
>
> On Thu, Aug 2, 2018 at 4:37 PM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>>
>>
>> On Thu, Aug 2, 2018 at 3:49 PM, Xavi Hernandez <xhernandez at redhat.com>
>> wrote:
>>
>>> On Thu, Aug 2, 2018 at 6:14 AM Atin Mukherjee <amukherj at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jul 31, 2018 at 10:11 PM Atin Mukherjee <amukherj at redhat.com>
>>>> wrote:
>>>>
>>>>> I just went through the nightly regression report of brick mux runs
>>>>> and here's what I can summarize.
>>>>>
>>>>> ============================================================
>>>>> ============================================================
>>>>> =================================================
>>>>> Fails only with brick-mux
>>>>> ============================================================
>>>>> ============================================================
>>>>> =================================================
>>>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even
>>>>> after 400 secs. Refer https://fstat.gluster.org/
>>>>> failure/209?state=2&start_date=2018-06-30&end_date=2018-
>>>>> 07-31&branch=all, specifically the latest report
>>>>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText
>>>>> . Wasn't timing out as frequently as it was till 12 July. But since 27
>>>>> July, it has timed out twice. Beginning to believe commit
>>>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now
>>>>> 400 secs isn't sufficient enough (Mohit?)
>>>>>
>>>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>>>>> (Ref - https://build.gluster.org/job/regression-test-with-
>>>>> multiplex/814/console) -  Test fails only in brick-mux mode, AI on
>>>>> Atin to look at and get back.
>>>>>
>>>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>>>>> https://build.gluster.org/job/regression-test-with-
>>>>> multiplex/813/console) - Seems like failed just twice in last 30 days
>>>>> as per https://fstat.gluster.org/failure/251?state=2&start_
>>>>> date=2018-06-30&end_date=2018-07-31&branch=all. Need help from AFR
>>>>> team.
>>>>>
>>>>> tests/bugs/quota/bug-1293601.t (https://build.gluster.org/
>>>>> job/regression-test-with-multiplex/812/console) - Hasn't failed after
>>>>> 26 July and earlier it was failing regularly. Did we fix this test through
>>>>> any patch (Mohit?)
>>>>>
>>>>> tests/bitrot/bug-1373520.t - (https://build.gluster.org/
>>>>> job/regression-test-with-multiplex/811/console)  - Hasn't failed
>>>>> after 27 July and earlier it was failing regularly. Did we fix this test
>>>>> through any patch (Mohit?)
>>>>>
>>>>
>>>> I see this has failed in day before yesterday's regression run as well
>>>> (and I could reproduce it locally with brick mux enabled). The test fails
>>>> in healing a file within a particular time period.
>>>>
>>>> *15:55:19* not ok 25 Got "0" instead of "512", LINENUM:55*15:55:19* FAILED COMMAND: 512 path_size /d/backends/patchy5/FILE1
>>>>
>>>> Need EC dev's help here.
>>>>
>>>
>>> I'm not sure where the problem is exactly. I've seen that when the test
>>> fails, self-heal is attempting to heal the file, but when the file is
>>> accessed, an Input/Output error is returned, aborting heal. I've checked
>>> that a heal is attempted every time the file is accessed, but it fails
>>> always. This error seems to come from bit-rot stub xlator.
>>>
>>> When in this situation, if I stop and start the volume, self-heal
>>> immediately heals the files. It seems like an stale state that is kept by
>>> the stub xlator, preventing the file from being healed.
>>>
>>> Adding bit-rot maintainers for help on this one.
>>>
>>
>> Bitrot-stub marks the file as corrupted in inode_ctx. But when the file
>> and it's hardlink are deleted from that brick and a lookup is done
>> on the file, it cleans up the marker on getting ENOENT. This is part of
>> recovery steps, and only md-cache is disabled during the process.
>> Is there any other perf xlators that needs to be disabled for this
>> scenario to expect a lookup/revalidate on the brick where
>> the back end file is deleted?
>>
>
> But the same test doesn't fail with brick multiplexing not enabled. Do we
> know why?
>
Don't know, something to do with perf xlators I suppose. It's not
repdroduced on my local system with brick-mux enabled as well. But it's
happening on Xavis' system.

Xavi,
Could you try with the patch [1] and let me know whether it fixes the issue.

[1] https://review.gluster.org/#/c/20619/1

>
>
>>
>>> Xavi
>>>
>>>
>>>
>>>>
>>>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a
>>>>> core, not sure if related to brick mux or not, so not sure if brick mux is
>>>>> culprit here or not. Ref - https://build.gluster.org/job/
>>>>> regression-test-with-multiplex/806/console . Seems to be a glustershd
>>>>> crash. Need help from AFR folks.
>>>>>
>>>>> ============================================================
>>>>> ============================================================
>>>>> =================================================
>>>>> Fails for non-brick mux case too
>>>>> ============================================================
>>>>> ============================================================
>>>>> =================================================
>>>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>>>>> very often, with out brick mux as well. Refer
>>>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
>>>>> . There's an email in gluster-devel and a BZ 1610240 for the same.
>>>>>
>>>>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>>>>> https://build.gluster.org/job/regression-test-with-
>>>>> multiplex/815/console) - seems to be a new failure, however seen this
>>>>> for a non-brick-mux case too - https://build.gluster.org/job/
>>>>> regression-test-burn-in/4039/consoleText . Need some eyes from AFR
>>>>> folks.
>>>>>
>>>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to
>>>>> brick mux, have seen this failing at multiple default regression runs.
>>>>> Refer https://fstat.gluster.org/failure/392?state=2&start_
>>>>> date=2018-06-30&end_date=2018-07-31&branch=all . We need help from
>>>>> geo-rep dev to root cause this earlier than later
>>>>>
>>>>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to
>>>>> brick mux, have seen this failing at multiple default regression runs.
>>>>> Refer https://fstat.gluster.org/failure/393?state=2&start_
>>>>> date=2018-06-30&end_date=2018-07-31&branch=all . We need help from
>>>>> geo-rep dev to root cause this earlier than later
>>>>>
>>>>> tests/bugs/glusterd/validating-server-quorum.t (
>>>>> https://build.gluster.org/job/regression-test-with-
>>>>> multiplex/810/console) - Fails for non-brick-mux cases too,
>>>>> https://fstat.gluster.org/failure/580?state=2&start_
>>>>> date=2018-06-30&end_date=2018-07-31&branch=all .  Atin has a patch
>>>>> https://review.gluster.org/20584 which resolves it but patch is
>>>>> failing regression for a different test which is unrelated.
>>>>>
>>>>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>>>>> (Ref - https://build.gluster.org/job/regression-test-with-
>>>>> multiplex/809/console) - fails for non brick mux case too -
>>>>> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText
>>>>> - Need some eyes from AFR folks.
>>>>>
>>>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>


-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180802/0153175e/attachment-0001.html>


More information about the Gluster-devel mailing list