[Gluster-devel] POC- Distributed regression testing framework

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Oct 4 09:15:49 UTC 2018


On Thu, Oct 4, 2018 at 2:15 PM Xavi Hernandez <jahernan at redhat.com> wrote:

> On Thu, Oct 4, 2018 at 9:47 AM Amar Tumballi <atumball at redhat.com> wrote:
>
>>
>>
>> On Thu, Oct 4, 2018 at 12:54 PM Xavi Hernandez <jahernan at redhat.com>
>> wrote:
>>
>>> On Wed, Oct 3, 2018 at 11:57 AM Deepshikha Khandelwal <
>>> dkhandel at redhat.com> wrote:
>>>
>>>> Hello folks,
>>>>
>>>> Distributed-regression job[1] is now a part of Gluster's
>>>> nightly-master build pipeline. The following are the issues we have
>>>> resolved since we started working on this:
>>>>
>>>> 1) Collecting gluster logs from servers.
>>>> 2) Tests failed due to infra-related issues have been fixed.
>>>> 3) Time taken to run regression testing reduced to ~50-60 minutes.
>>>>
>>>> To get time down to 40 minutes needs your help!
>>>>
>>>> Currently, there is a test that is failing:
>>>>
>>>> tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t
>>>>
>>>> This needs fixing first.
>>>>
>>>> There's a test that takes 14 minutes to complete -
>>>> `tests/bugs/index/bug-1559004-EMLINK-handling.t`. A single test taking
>>>> 14 minutes is not something we can distribute. Can we look at how we
>>>> can speed this up[2]? When this test fails, it is re-attempted,
>>>> further increasing the time. This happens in the regular
>>>> centos7-regression job as well.
>>>>
>>>
>>> I made a change [1] to reduce the amount of time this tests needs. With
>>> this change the test completes in about 90 seconds. It would need some
>>> reviews from maintainers though.
>>>
>>> Do you want I send a patch with this change alone ?
>>>
>>> Xavi
>>>
>>> [1]
>>> https://review.gluster.org/#/c/glusterfs/+/19254/22/tests/bugs/index/bug-1559004-EMLINK-handling.t
>>>
>>>
>>
>> Yes please! It would be useful! We can merge it sooner that way!
>>
>
> Patch: https://review.gluster.org/21341
>

Merged!


>
>
>>
>> -Amar
>>
>>
>>>
>>>> If you see any other issues, please file a bug[3].
>>>>
>>>> [1]: https://build.gluster.org/job/distributed-regression
>>>> [2]: https://build.gluster.org/job/distributed-regression/264/console
>>>> [3]:
>>>> https://bugzilla.redhat.com/enter_bug.cgi?product=glusterfs&component=project-infrastructure
>>>>
>>>> Thanks,
>>>> Deepshikha Khandelwal
>>>> On Tue, Jun 26, 2018 at 9:02 AM Nigel Babu <nigelb at redhat.com> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Jun 25, 2018 at 7:28 PM Amar Tumballi <atumball at redhat.com>
>>>> wrote:
>>>> >>
>>>> >>
>>>> >>
>>>> >>> There are currently a few known issues:
>>>> >>> * Not collecting the entire logs (/var/log/glusterfs) from servers.
>>>> >>
>>>> >>
>>>> >> If I look at the activities involved with regression failures, this
>>>> can wait.
>>>> >
>>>> >
>>>> > Well, we can't debug the current failures without having the logs. So
>>>> this has to be fixed first.
>>>> >
>>>> >>
>>>> >>
>>>> >>>
>>>> >>> * A few tests fail due to infra-related issues like geo-rep tests.
>>>> >>
>>>> >>
>>>> >> Please open bugs for this, so we can track them, and take it to
>>>> closure.
>>>> >
>>>> >
>>>> > These are failing due to infra reasons. Most likely subtle
>>>> differences in the setup of these nodes vs our normal nodes. We'll only be
>>>> able to debug them once we get the logs. I know the geo-rep ones are easy
>>>> to fix. The playbook for setting up geo-rep correctly just didn't make it
>>>> over to the playbook used for these images.
>>>> >
>>>> >>
>>>> >>
>>>> >>>
>>>> >>> * Takes ~80 minutes with 7 distributed servers (targetting 60
>>>> minutes)
>>>> >>
>>>> >>
>>>> >> Time can change with more tests added, and also please plan to have
>>>> number of server as 1 to n.
>>>> >
>>>> >
>>>> > While the n is configurable, however it will be fixed to a single
>>>> digit number for now. We will need to place *some* limitation somewhere or
>>>> else we'll end up not being able to control our cloud bills.
>>>> >
>>>> >>
>>>> >>
>>>> >>>
>>>> >>> * We've only tested plain regressions. ASAN and Valgrind are
>>>> currently untested.
>>>> >>
>>>> >>
>>>> >> Great to have it running not 'per patch', but as nightly, or weekly
>>>> to start with.
>>>> >
>>>> >
>>>> > This is currently not targeted until we phase out current regressions.
>>>> >
>>>> >>>
>>>> >>>
>>>> >>> Before bringing it into production, we'll run this job nightly and
>>>> >>> watch it for a month to debug the other failures.
>>>> >>>
>>>> >>
>>>> >> I would say, bring it to production sooner, say 2 weeks, and also
>>>> plan to have the current regression as is with a special command like 'run
>>>> regression in-one-machine' in gerrit (or something similar) with voting
>>>> rights, so we can fall back to this method if something is broken in
>>>> parallel testing.
>>>> >>
>>>> >> I have seen that regardless of amount of time we put some scripts in
>>>> testing, the day we move to production, some thing would be broken. So, let
>>>> that happen earlier than later, so it would help next release branching
>>>> out. Don't want to be stuck for branching due to infra failures.
>>>> >
>>>> >
>>>> > Having two regression jobs that can vote is going to cause more
>>>> confusion than it's worth. There are a couple of intermittent memory issues
>>>> with the test script that we need to debug and fix before I'm comfortable
>>>> in making this job a voting job. We've worked around these problems right
>>>> now. It still pops up now and again. The fact that things break often is
>>>> not an excuse to prevent avoidable failures.  The one month timeline was
>>>> taken with all these factors into consideration. The 2-week timeline is a
>>>> no-go at this point.
>>>> >
>>>> > When we are ready to make the switch, we won't be switching 100% of
>>>> the job. We'll start with a sliding scale so that we can monitor failures
>>>> and machine creation adequately.
>>>> >
>>>> > --
>>>> > nigelb
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel



-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20181004/b086f5c7/attachment-0001.html>


More information about the Gluster-devel mailing list