[Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

Xavi Hernandez xhernandez at redhat.com
Thu Aug 2 10:19:24 UTC 2018


On Thu, Aug 2, 2018 at 6:14 AM Atin Mukherjee <amukherj at redhat.com> wrote:

>
>
> On Tue, Jul 31, 2018 at 10:11 PM Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>> I just went through the nightly regression report of brick mux runs and
>> here's what I can summarize.
>>
>>
>> =========================================================================================================================================================================
>> Fails only with brick-mux
>>
>> =========================================================================================================================================================================
>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
>> 400 secs. Refer
>> https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all,
>> specifically the latest report
>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText .
>> Wasn't timing out as frequently as it was till 12 July. But since 27 July,
>> it has timed out twice. Beginning to believe commit
>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
>> secs isn't sufficient enough (Mohit?)
>>
>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> (Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
>> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>>
>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
>> - Seems like failed just twice in last 30 days as per
>> https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all.
>> Need help from AFR team.
>>
>> tests/bugs/quota/bug-1293601.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
>> fix this test through any patch (Mohit?)
>>
>> tests/bitrot/bug-1373520.t - (
>> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
>> fix this test through any patch (Mohit?)
>>
>
> I see this has failed in day before yesterday's regression run as well
> (and I could reproduce it locally with brick mux enabled). The test fails
> in healing a file within a particular time period.
>
> *15:55:19* not ok 25 Got "0" instead of "512", LINENUM:55*15:55:19* FAILED COMMAND: 512 path_size /d/backends/patchy5/FILE1
>
> Need EC dev's help here.
>

I'm not sure where the problem is exactly. I've seen that when the test
fails, self-heal is attempting to heal the file, but when the file is
accessed, an Input/Output error is returned, aborting heal. I've checked
that a heal is attempted every time the file is accessed, but it fails
always. This error seems to come from bit-rot stub xlator.

When in this situation, if I stop and start the volume, self-heal
immediately heals the files. It seems like an stale state that is kept by
the stub xlator, preventing the file from being healed.

Adding bit-rot maintainers for help on this one.

Xavi



>
>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
>> not sure if related to brick mux or not, so not sure if brick mux is
>> culprit here or not. Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/806/console
>> . Seems to be a glustershd crash. Need help from AFR folks.
>>
>>
>> =========================================================================================================================================================================
>> Fails for non-brick mux case too
>>
>> =========================================================================================================================================================================
>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>> very often, with out brick mux as well. Refer
>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText .
>> There's an email in gluster-devel and a BZ 1610240 for the same.
>>
>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
>> - seems to be a new failure, however seen this for a non-brick-mux case too
>> - https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
>> . Need some eyes from AFR folks.
>>
>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to brick
>> mux, have seen this failing at multiple default regression runs. Refer
>> https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>> . We need help from geo-rep dev to root cause this earlier than later
>>
>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
>> mux, have seen this failing at multiple default regression runs. Refer
>> https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>> . We need help from geo-rep dev to root cause this earlier than later
>>
>> tests/bugs/glusterd/validating-server-quorum.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/810/console)
>> - Fails for non-brick-mux cases too,
>> https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>> .  Atin has a patch https://review.gluster.org/20584 which resolves it
>> but patch is failing regression for a different test which is unrelated.
>>
>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>> (Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/809/console)
>> - fails for non brick mux case too -
>> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText -
>> Need some eyes from AFR folks.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180802/cc4ed607/attachment.html>


More information about the Gluster-devel mailing list