[Gluster-devel] [Gluster-infra] bug-1432542-mpx-restart-crash.t failing

Mon Jul 9 10:15:08 UTC 2018

On Mon, Jul 9, 2018 at 11:14 AM Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:

> Hi Deepshikha,
>
> Are you looking into this failure? I can still see this happening for all
> the regression runs.
>

I've executed the failing script on my laptop and all tests finish
relatively fast. What seems to take time is the final cleanup. I can see
'semanage' taking some CPU during destruction of volumes. The test required
350 seconds to finish successfully.

Not sure what caused the cleanup time to increase, but I've created a bug
[1] to track this and a patch [2] to give more time to this test. This
should allow all blocked regressions to complete successfully.

Xavi

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1599250
[2] https://review.gluster.org/20482

> Thanks & Regards,
> Karthik
>
> On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee <amukherj at redhat.com> wrote:
>
>>
>> https://build.gluster.org/job/regression-test-with-multiplex/794/display/redirect
>> has the same test failing. Is the reason of the failure different given
>> this is on jenkins?
>>
>> On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal <dkhandel at redhat.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> The issue[1] has been resolved. Now the softserve instance will be
>>> having 2GB RAM i.e. same as that of the Jenkins builder's sizing
>>> configurations.
>>>
>>> [1] https://github.com/gluster/softserve/issues/40
>>>
>>> Thanks,
>>> Deepshikha Khandelwal
>>>
>>> On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya <ksubrahm at redhat.com>
>>> wrote:
>>> >
>>> >
>>> > On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, <
>>> dkhandel at redhat.com>
>>> > wrote:
>>> >>
>>> >> Hi Poornima/Karthik,
>>> >>
>>> >> We've looked into the memory error that this softserve instance have
>>> >> showed up. These machine instances have 1GB RAM which is not in the
>>> >> case with the Jenkins builder. It's 2GB RAM there.
>>> >>
>>> >> We've created the issue [1] and will solve it sooner.
>>> >
>>> > Great. Thanks for the update.
>>> >>
>>> >>
>>> >> Sorry for the inconvenience.
>>> >>
>>> >> [1] https://github.com/gluster/softserve/issues/40
>>> >>
>>> >> Thanks,
>>> >> Deepshikha Khandelwal
>>> >>
>>> >> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya <
>>> ksubrahm at redhat.com>
>>> >> wrote:
>>> >> > Thanks Poornima for the analysis.
>>> >> > Can someone work on fixing this please?
>>> >> >
>>> >> > ~Karthik
>>> >> >
>>> >> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah
>>> >> > <pgurusid at redhat.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> The same test case is failing for my patch as well [1]. I
>>> requested for
>>> >> >> a
>>> >> >> regression system and tried to reproduce it.
>>> >> >> From my analysis, the brick process (mutiplexed) is consuming a
>>> lot of
>>> >> >> memory, and is being OOM killed. The regression has 1GB ram and the
>>> >> >> process
>>> >> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable
>>> >> >> considering
>>> >> >> there is 1000 threads in that brick process.
>>> >> >> Ways to fix:
>>> >> >> - Increase the regression system RAM size OR
>>> >> >> - Decrease the number of volumes in the test case.
>>> >> >>
>>> >> >> But what is strange is why the test passes sometimes for some
>>> patches.
>>> >> >> There could be some bug/? in memory consumption.
>>> >> >>
>>> >> >> Regards,
>>> >> >> Poornima
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya
>>> >> >> <ksubrahm at redhat.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> $subject is failing on centos regression for most of the patches
>>> with
>>> >> >>> timeout error.
>>> >> >>>
>>> >> >>> 07:32:34
>>> >> >>>
>>> >> >>>
>>> ================================================================================
>>> >> >>> 07:32:34 [07:33:05] Running tests in file
>>> >> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>> >> >>> 07:32:34 Timeout set is 300, default 200
>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed
>>> out
>>> >> >>> after 300 seconds
>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t: bad
>>> status
>>> >> >>> 124
>>> >> >>> 07:37:34
>>> >> >>> 07:37:34        *********************************
>>> >> >>> 07:37:34        *       REGRESSION FAILED       *
>>> >> >>> 07:37:34        * Retrying failed tests in case *
>>> >> >>> 07:37:34        * we got some spurious failures *
>>> >> >>> 07:37:34        *********************************
>>> >> >>> 07:37:34
>>> >> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed
>>> out
>>> >> >>> after 300 seconds
>>> >> >>> 07:42:34 End of test
>>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>> >> >>> 07:42:34
>>> >> >>>
>>> >> >>>
>>> ================================================================================
>>> >> >>>
>>> >> >>> Can anyone take a look?
>>> >> >>>
>>> >> >>> Thanks,
>>> >> >>> Karthik
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> _______________________________________________
>>> >> >>> Gluster-devel mailing list
>>> >> >>> Gluster-devel at gluster.org
>>> >> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>> >> >>
>>> >> >>
>>> >> >
>>> >> > _______________________________________________
>>> >> > Gluster-infra mailing list
>>> >> > Gluster-infra at gluster.org
>>> >> > https://lists.gluster.org/mailman/listinfo/gluster-infra
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>> --
>> - Atin (atinm)
>>
> _______________________________________________
> Gluster-infra mailing list
> Gluster-infra at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-infra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180709/283c34b7/attachment-0001.html>