[Gluster-infra] [Gluster-devel] bug-1432542-mpx-restart-crash.t failing

Tue Jul 10 03:38:02 UTC 2018

On Mon, Jul 9, 2018 at 8:10 PM, Nithya Balachandran <nbalacha at redhat.com>
wrote:

> We discussed reducing the number of volumes in the maintainers'
> meeting.Should we still go ahead and do that?
>

I m not sure about exactly what was discussed. But reducing the number of
volumes may defeat the purpose of the test, as the bug it is fixing is
reproducible only with more number of volumes. I think Jeff will be able to
tell how much is more. I think we can move this to centos CI brick mux
regression job, if it runs on machines with higher RAM?

Regards,
Poornima

>
>
> On 9 July 2018 at 15:45, Xavi Hernandez <jahernan at redhat.com> wrote:
>
>> On Mon, Jul 9, 2018 at 11:14 AM Karthik Subrahmanya <ksubrahm at redhat.com>
>> wrote:
>>
>>> Hi Deepshikha,
>>>
>>> Are you looking into this failure? I can still see this happening for
>>> all the regression runs.
>>>
>>
>> I've executed the failing script on my laptop and all tests finish
>> relatively fast. What seems to take time is the final cleanup. I can see
>> 'semanage' taking some CPU during destruction of volumes. The test required
>> 350 seconds to finish successfully.
>>
>> Not sure what caused the cleanup time to increase, but I've created a bug
>> [1] to track this and a patch [2] to give more time to this test. This
>> should allow all blocked regressions to complete successfully.
>>
>> Xavi
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1599250
>> [2] https://review.gluster.org/20482
>>
>>
>>> Thanks & Regards,
>>> Karthik
>>>
>>> On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee <amukherj at redhat.com>
>>> wrote:
>>>
>>>> https://build.gluster.org/job/regression-test-with-multiplex
>>>> /794/display/redirect has the same test failing. Is the reason of the
>>>> failure different given this is on jenkins?
>>>>
>>>> On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal <dkhandel at redhat.com>
>>>> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> The issue[1] has been resolved. Now the softserve instance will be
>>>>> having 2GB RAM i.e. same as that of the Jenkins builder's sizing
>>>>> configurations.
>>>>>
>>>>> [1] https://github.com/gluster/softserve/issues/40
>>>>>
>>>>> Thanks,
>>>>> Deepshikha Khandelwal
>>>>>
>>>>> On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya <
>>>>> ksubrahm at redhat.com> wrote:
>>>>> >
>>>>> >
>>>>> > On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, <
>>>>> dkhandel at redhat.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> Hi Poornima/Karthik,
>>>>> >>
>>>>> >> We've looked into the memory error that this softserve instance have
>>>>> >> showed up. These machine instances have 1GB RAM which is not in the
>>>>> >> case with the Jenkins builder. It's 2GB RAM there.
>>>>> >>
>>>>> >> We've created the issue [1] and will solve it sooner.
>>>>> >
>>>>> > Great. Thanks for the update.
>>>>> >>
>>>>> >>
>>>>> >> Sorry for the inconvenience.
>>>>> >>
>>>>> >> [1] https://github.com/gluster/softserve/issues/40
>>>>> >>
>>>>> >> Thanks,
>>>>> >> Deepshikha Khandelwal
>>>>> >>
>>>>> >> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya <
>>>>> ksubrahm at redhat.com>
>>>>> >> wrote:
>>>>> >> > Thanks Poornima for the analysis.
>>>>> >> > Can someone work on fixing this please?
>>>>> >> >
>>>>> >> > ~Karthik
>>>>> >> >
>>>>> >> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah
>>>>> >> > <pgurusid at redhat.com>
>>>>> >> > wrote:
>>>>> >> >>
>>>>> >> >> The same test case is failing for my patch as well [1]. I
>>>>> requested for
>>>>> >> >> a
>>>>> >> >> regression system and tried to reproduce it.
>>>>> >> >> From my analysis, the brick process (mutiplexed) is consuming a
>>>>> lot of
>>>>> >> >> memory, and is being OOM killed. The regression has 1GB ram and
>>>>> the
>>>>> >> >> process
>>>>> >> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable
>>>>> >> >> considering
>>>>> >> >> there is 1000 threads in that brick process.
>>>>> >> >> Ways to fix:
>>>>> >> >> - Increase the regression system RAM size OR
>>>>> >> >> - Decrease the number of volumes in the test case.
>>>>> >> >>
>>>>> >> >> But what is strange is why the test passes sometimes for some
>>>>> patches.
>>>>> >> >> There could be some bug/? in memory consumption.
>>>>> >> >>
>>>>> >> >> Regards,
>>>>> >> >> Poornima
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya
>>>>> >> >> <ksubrahm at redhat.com>
>>>>> >> >> wrote:
>>>>> >> >>>
>>>>> >> >>> Hi,
>>>>> >> >>>
>>>>> >> >>> $subject is failing on centos regression for most of the
>>>>> patches with
>>>>> >> >>> timeout error.
>>>>> >> >>>
>>>>> >> >>> 07:32:34
>>>>> >> >>>
>>>>> >> >>> ============================================================
>>>>> ====================
>>>>> >> >>> 07:32:34 [07:33:05] Running tests in file
>>>>> >> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>> >> >>> 07:32:34 Timeout set is 300, default 200
>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>> timed out
>>>>> >> >>> after 300 seconds
>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t:
>>>>> bad status
>>>>> >> >>> 124
>>>>> >> >>> 07:37:34
>>>>> >> >>> 07:37:34        *********************************
>>>>> >> >>> 07:37:34        *       REGRESSION FAILED       *
>>>>> >> >>> 07:37:34        * Retrying failed tests in case *
>>>>> >> >>> 07:37:34        * we got some spurious failures *
>>>>> >> >>> 07:37:34        *********************************
>>>>> >> >>> 07:37:34
>>>>> >> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>> timed out
>>>>> >> >>> after 300 seconds
>>>>> >> >>> 07:42:34 End of test ./tests/bugs/core/bug-1432542-
>>>>> mpx-restart-crash.t
>>>>> >> >>> 07:42:34
>>>>> >> >>>
>>>>> >> >>> ============================================================
>>>>> ====================
>>>>> >> >>>
>>>>> >> >>> Can anyone take a look?
>>>>> >> >>>
>>>>> >> >>> Thanks,
>>>>> >> >>> Karthik
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> _______________________________________________
>>>>> >> >>> Gluster-devel mailing list
>>>>> >> >>> Gluster-devel at gluster.org
>>>>> >> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >> > _______________________________________________
>>>>> >> > Gluster-infra mailing list
>>>>> >> > Gluster-infra at gluster.org
>>>>> >> > https://lists.gluster.org/mailman/listinfo/gluster-infra
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>> --
>>>> - Atin (atinm)
>>>>
>>> _______________________________________________
>>> Gluster-infra mailing list
>>> Gluster-infra at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-infra
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20180710/13cb54c6/attachment.html>