[Gluster-devel] [Gluster-infra] bug-1432542-mpx-restart-crash.t failing

Tue Jul 10 04:05:15 UTC 2018

On Tue, Jul 10, 2018, 9:30 AM Amar Tumballi <atumball at redhat.com> wrote:

>
>
> On Mon, Jul 9, 2018 at 8:10 PM, Nithya Balachandran <nbalacha at redhat.com>
> wrote:
>
>> We discussed reducing the number of volumes in the maintainers'
>> meeting.Should we still go ahead and do that?
>>
>>
>>
> It would still be a good exercise, IMO. Reducing it to 50-60 volumes from
> 120 now.
>
AFAIK, the test case only creates 20 volumes with 6 bricks and hence 120
bricks served from one brick process. This results in 1000+ threads and 14g
VIRT 4-5g RES.

Regards,
Poornima

>
>> On 9 July 2018 at 15:45, Xavi Hernandez <jahernan at redhat.com> wrote:
>>
>>> On Mon, Jul 9, 2018 at 11:14 AM Karthik Subrahmanya <ksubrahm at redhat.com>
>>> wrote:
>>>
>>>> Hi Deepshikha,
>>>>
>>>> Are you looking into this failure? I can still see this happening for
>>>> all the regression runs.
>>>>
>>>
>>> I've executed the failing script on my laptop and all tests finish
>>> relatively fast. What seems to take time is the final cleanup. I can see
>>> 'semanage' taking some CPU during destruction of volumes. The test required
>>> 350 seconds to finish successfully.
>>>
>>> Not sure what caused the cleanup time to increase, but I've created a
>>> bug [1] to track this and a patch [2] to give more time to this test. This
>>> should allow all blocked regressions to complete successfully.
>>>
>>> Xavi
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1599250
>>> [2] https://review.gluster.org/20482
>>>
>>>
>>>> Thanks & Regards,
>>>> Karthik
>>>>
>>>> On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee <amukherj at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>> https://build.gluster.org/job/regression-test-with-multiplex/794/display/redirect
>>>>> has the same test failing. Is the reason of the failure different given
>>>>> this is on jenkins?
>>>>>
>>>>> On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal <
>>>>> dkhandel at redhat.com> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> The issue[1] has been resolved. Now the softserve instance will be
>>>>>> having 2GB RAM i.e. same as that of the Jenkins builder's sizing
>>>>>> configurations.
>>>>>>
>>>>>> [1] https://github.com/gluster/softserve/issues/40
>>>>>>
>>>>>> Thanks,
>>>>>> Deepshikha Khandelwal
>>>>>>
>>>>>> On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya <
>>>>>> ksubrahm at redhat.com> wrote:
>>>>>> >
>>>>>> >
>>>>>> > On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, <
>>>>>> dkhandel at redhat.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Hi Poornima/Karthik,
>>>>>> >>
>>>>>> >> We've looked into the memory error that this softserve instance
>>>>>> have
>>>>>> >> showed up. These machine instances have 1GB RAM which is not in the
>>>>>> >> case with the Jenkins builder. It's 2GB RAM there.
>>>>>> >>
>>>>>> >> We've created the issue [1] and will solve it sooner.
>>>>>> >
>>>>>> > Great. Thanks for the update.
>>>>>> >>
>>>>>> >>
>>>>>> >> Sorry for the inconvenience.
>>>>>> >>
>>>>>> >> [1] https://github.com/gluster/softserve/issues/40
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Deepshikha Khandelwal
>>>>>> >>
>>>>>> >> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya <
>>>>>> ksubrahm at redhat.com>
>>>>>> >> wrote:
>>>>>> >> > Thanks Poornima for the analysis.
>>>>>> >> > Can someone work on fixing this please?
>>>>>> >> >
>>>>>> >> > ~Karthik
>>>>>> >> >
>>>>>> >> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah
>>>>>> >> > <pgurusid at redhat.com>
>>>>>> >> > wrote:
>>>>>> >> >>
>>>>>> >> >> The same test case is failing for my patch as well [1]. I
>>>>>> requested for
>>>>>> >> >> a
>>>>>> >> >> regression system and tried to reproduce it.
>>>>>> >> >> From my analysis, the brick process (mutiplexed) is consuming a
>>>>>> lot of
>>>>>> >> >> memory, and is being OOM killed. The regression has 1GB ram and
>>>>>> the
>>>>>> >> >> process
>>>>>> >> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable
>>>>>> >> >> considering
>>>>>> >> >> there is 1000 threads in that brick process.
>>>>>> >> >> Ways to fix:
>>>>>> >> >> - Increase the regression system RAM size OR
>>>>>> >> >> - Decrease the number of volumes in the test case.
>>>>>> >> >>
>>>>>> >> >> But what is strange is why the test passes sometimes for some
>>>>>> patches.
>>>>>> >> >> There could be some bug/? in memory consumption.
>>>>>> >> >>
>>>>>> >> >> Regards,
>>>>>> >> >> Poornima
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya
>>>>>> >> >> <ksubrahm at redhat.com>
>>>>>> >> >> wrote:
>>>>>> >> >>>
>>>>>> >> >>> Hi,
>>>>>> >> >>>
>>>>>> >> >>> $subject is failing on centos regression for most of the
>>>>>> patches with
>>>>>> >> >>> timeout error.
>>>>>> >> >>>
>>>>>> >> >>> 07:32:34
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> ================================================================================
>>>>>> >> >>> 07:32:34 [07:33:05] Running tests in file
>>>>>> >> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>>> >> >>> 07:32:34 Timeout set is 300, default 200
>>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>>> timed out
>>>>>> >> >>> after 300 seconds
>>>>>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t:
>>>>>> bad status
>>>>>> >> >>> 124
>>>>>> >> >>> 07:37:34
>>>>>> >> >>> 07:37:34        *********************************
>>>>>> >> >>> 07:37:34        *       REGRESSION FAILED       *
>>>>>> >> >>> 07:37:34        * Retrying failed tests in case *
>>>>>> >> >>> 07:37:34        * we got some spurious failures *
>>>>>> >> >>> 07:37:34        *********************************
>>>>>> >> >>> 07:37:34
>>>>>> >> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>>> timed out
>>>>>> >> >>> after 300 seconds
>>>>>> >> >>> 07:42:34 End of test
>>>>>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>>>> >> >>> 07:42:34
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> ================================================================================
>>>>>> >> >>>
>>>>>> >> >>> Can anyone take a look?
>>>>>> >> >>>
>>>>>> >> >>> Thanks,
>>>>>> >> >>> Karthik
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> _______________________________________________
>>>>>> >> >>> Gluster-devel mailing list
>>>>>> >> >>> Gluster-devel at gluster.org
>>>>>> >> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >> > _______________________________________________
>>>>>> >> > Gluster-infra mailing list
>>>>>> >> > Gluster-infra at gluster.org
>>>>>> >> > https://lists.gluster.org/mailman/listinfo/gluster-infra
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>> --
>>>>> - Atin (atinm)
>>>>>
>>>> _______________________________________________
>>>> Gluster-infra mailing list
>>>> Gluster-infra at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-infra
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>> _______________________________________________
>> Gluster-infra mailing list
>> Gluster-infra at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-infra
>>
>
>
>
> --
> Amar Tumballi (amarts)
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180710/a1fb8d16/attachment.html>