[Gluster-Maintainers] [Gluster-devel] Release 5: Master branch health report (Week of 30th July)

Karthik Subrahmanya ksubrahm at redhat.com
Fri Aug 3 11:37:02 UTC 2018


On Fri, Aug 3, 2018 at 3:07 PM Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:

>
>
> On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya <ksubrahm at redhat.com>
> wrote:
>
>>
>>
>> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya <ksubrahm at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, <amukherj at redhat.com>
>>> wrote:
>>>
>>>> I just went through the nightly regression report of brick mux runs and
>>>> here's what I can summarize.
>>>>
>>>>
>>>> =========================================================================================================================================================================
>>>> Fails only with brick-mux
>>>>
>>>> =========================================================================================================================================================================
>>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
>>>> 400 secs. Refer
>>>> https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all,
>>>> specifically the latest report
>>>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText
>>>> . Wasn't timing out as frequently as it was till 12 July. But since 27
>>>> July, it has timed out twice. Beginning to believe commit
>>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
>>>> secs isn't sufficient enough (Mohit?)
>>>>
>>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>>>> (Ref -
>>>> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
>>>> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>>>>
>>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
>>>> - Seems like failed just twice in last 30 days as per
>>>> https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all.
>>>> Need help from AFR team.
>>>>
>>>> tests/bugs/quota/bug-1293601.t (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
>>>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
>>>> fix this test through any patch (Mohit?)
>>>>
>>>> tests/bitrot/bug-1373520.t - (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
>>>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
>>>> fix this test through any patch (Mohit?)
>>>>
>>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
>>>> not sure if related to brick mux or not, so not sure if brick mux is
>>>> culprit here or not. Ref -
>>>> https://build.gluster.org/job/regression-test-with-multiplex/806/console
>>>> . Seems to be a glustershd crash. Need help from AFR folks.
>>>>
>>>>
>>>> =========================================================================================================================================================================
>>>> Fails for non-brick mux case too
>>>>
>>>> =========================================================================================================================================================================
>>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>>>> very often, with out brick mux as well. Refer
>>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
>>>> . There's an email in gluster-devel and a BZ 1610240 for the same.
>>>>
>>>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
>>>> - seems to be a new failure, however seen this for a non-brick-mux case too
>>>> -
>>>> https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
>>>> . Need some eyes from AFR folks.
>>>>
>>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to
>>>> brick mux, have seen this failing at multiple default regression runs.
>>>> Refer
>>>> https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>>>> . We need help from geo-rep dev to root cause this earlier than later
>>>>
>>>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
>>>> mux, have seen this failing at multiple default regression runs. Refer
>>>> https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>>>> . We need help from geo-rep dev to root cause this earlier than later
>>>>
>>>> tests/bugs/glusterd/validating-server-quorum.t (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/810/console)
>>>> - Fails for non-brick-mux cases too,
>>>> https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>>>> .  Atin has a patch https://review.gluster.org/20584 which resolves it
>>>> but patch is failing regression for a different test which is unrelated.
>>>>
>>>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>>>> (Ref -
>>>> https://build.gluster.org/job/regression-test-with-multiplex/809/console)
>>>> - fails for non brick mux case too -
>>>> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText
>>>> - Need some eyes from AFR folks.
>>>>
>>> I am looking at this. It is not reproducible locally. Trying to do this
>>> on soft serve.
>>>
>>
>> In soft serve machine also it is not failing where the regression has
>> failed. But I found some other problem in the script.
>> Will fix that and add some extra logs so that it should be easier to
>> debug when it fails next time.
>>
>
> RCA for tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> failure:
> This test case basically completely fills 2 out of 3 bricks and provisions
> one brick with some extra space so that entry creation succeeds only on one
> brick and fails on other bricks.
> Since 2 of the bricks gets filled, only the entry creation succeeds on
> those bricks but the creation of gfid hard link inside the ".glusterfs"
> fails. This is a bug in "posix" code with entry transactions.
> If the gfid link creation fails we are just logging an error message and
> continuing. Since we depend on that gfid, the entry should be deleted if
> this fails.
> When the shd tries to heal those files it sees that the gfid link is not
> present for those files and it fails to heal.
>
> I will send a fix for this, which deletes the entry if it fails to create
> the link inside .glusterfs.
>
Patches posted for bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
failure:
https://review.gluster.org/#/c/20630/
https://review.gluster.org/#/c/20631/

>
> Regards,
> Karthik
>
>>
>>> Regards,
>>> Karthik
>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/maintainers/attachments/20180803/1bbcd8fe/attachment-0001.html>


More information about the maintainers mailing list