[Gluster-infra] [Gluster-Maintainers] Flaky Regression tests again?

Mon Aug 17 08:36:00 UTC 2020

I think we should look for the root cause of these failures. If we mark the
tests as Bad, the tests might go left behind. If someone is ready to own
the tests and keep track of the on-going efforts of root causing them, it
makes sense to mark them as Bad.

One more thought I have is, let's have a deadline discussed and fixed in an
upcoming community meeting. In the meeting let's own the failures and fix
them by the deadline. (If everyone agrees!)

On Mon, Aug 17, 2020 at 12:58 PM Deepshikha Khandelwal <dkhandel at redhat.com>
wrote:

>
>
>
> On Sat, Aug 15, 2020 at 7:33 PM Amar Tumballi <amar at kadalu.io> wrote:
>
>> If I look at the recent regression runs (
>> https://build.gluster.org/job/centos7-regression/), there is more than
>> 50% failure in tests.
>>
>> At least 90% of the failures are not due to the patch itself. Considering
>> regression tests are very critical for our patches to get merged, and takes
>> almost 6-7 hours now a days to complete, how can we make sure we are
>> passing regression with 100% certainty ?
>>
>> Again, out of this, there are only a few tests which keep failing, should
>> we revisit the tests and see why it is failing? or Should we mark them as
>> 'Good if it passes, but don't fail regression if the tests fail' condition?
>>
>> I think we should revisit these tests for the root cause.
>
>> Some tests I have listed here from recent failures:
>>
>> tests/bugs/core/multiplex-limit-issue-151.t
>> tests/bugs/distribute/bug-1122443.t +++
>> tests/bugs/distribute/bug-1117851.t
>> tests/bugs/glusterd/bug-857330/normal.t +
>> tests/basic/mount-nfs-auth.t +++++
>>
> It failed mainly on builder202. I disconnected the builder and will check
> what is going wrong. Though I don't have any full proof analysis on this
> one as it has been always flaky(failing quite randomly)
>
>>
>> tests/basic/changelog/changelog-snapshot.t
>> tests/basic/afr/split-brain-favorite-child-policy.t
>> tests/basic/distribute/rebal-all-nodes-migrate.t
>> tests/bugs/glusterd/quorum-value-check.t
>> tests/features/lock-migration/lkmigration-set-option.t
>> tests/bugs/nfs/bug-1116503.t
>> tests/basic/ec/ec-quorum-count-partial-failure.t
>>
>> Considering these are just 12 of 750+ tests we run, Should we even
>> consider marking them bad till they are fixed to be 100% consistent?
>>
> Makes sense.
>
>>
>> Any thoughts on how we should go ahead?
>>
>> Regards,
>> Amar
>>
>> (+) indicates a count, so more + you see against the file, more times
>> that failed.
>>
>> _______________________________________________
>> maintainers mailing list
>> maintainers at gluster.org
>> https://lists.gluster.org/mailman/listinfo/maintainers
>>
> _______________________________________________
> maintainers mailing list
> maintainers at gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>

-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20200817/32b7d7bd/attachment.html>