[Gluster-devel] too many failures on mpx-restart-crash.t on master branch

Poornima Gurusiddaiah pgurusid at redhat.com
Thu Dec 20 09:17:21 UTC 2018


Yeah, but Pranith mentioned that the issue is seen even without the iobuf
patch, so the test may fail even after fixing the thread count? Hence
reducing the volume count as suggested may be a better option.

Regards,
Poornima

On Thu, Dec 20, 2018 at 2:41 PM Amar Tumballi <atumball at redhat.com> wrote:

> Considering, we have the effort to reduce the threads in progress, should
> we mark it as known issue till we get the other reduced threads patch
> merged?
>
> -Amar
>
> On Thu, Dec 20, 2018 at 2:38 PM Poornima Gurusiddaiah <pgurusid at redhat.com>
> wrote:
>
>> So, this failure is related to patch [1] iobuf. Thanks to Pranith for
>> identifying this. This patch increases the memory consumption in the brick
>> mux use case(**) and causes oom kill. But it is not the problem with the
>> patch itself. The only way to rightly fix it is to fix the issue [2]. That
>> said we cannot wait until this issue is fixed, the possible work arounds
>> are:
>> - Reduce the volume creation count in test case mpx-restart-crash.t
>> (temporarily until [2] is fixed)
>> - Increase the resources(RAM to 4G?) on the regression system
>> - Revert the patch until [2] is completely fixed
>>
>> Root Cause:
>> Without the iobuf patch [1], we had a pre allocated pool of min size
>> 12.5MB(which can grow), in many cases this entire size may not be
>> completely used. Hence we moved to per thread mem pool for iobuf as well.
>> With this we expect the memory consumption of the processes to go down, and
>> it did go down.After creating 20 volumes on the system, the free -m output:
>> With this patch:
>>                    total        used        free      shared
>> buff/cache   available
>> Mem:           3789        2198       290         249        1300
>> 968
>> Swap:          3071           0         3071
>>
>> Without this patch:
>>                    total        used        free      shared
>> buff/cache   available
>> Mem:           3789        2280         115         488
>> 1393         647
>> Swap:          3071           0           3071
>> This output can vary based on system state, workload etc. This is not
>> indicative of the exact amount of memory reduction, but of the fact that
>> the memory usage is reduced.
>>
>> But, with brick mux the scenario is different. Since we use per thread
>> mem pool for iobuf in patch [1], the memory consumption due to iobuf
>> increases if the threads increases. In the current brick mux
>> implementation, for 20 volumes(in the mpx-restart-crash test), the number
>> of threads is 1439. And the allocated iobufs(or any other per thread mem
>> pool memory) doesn't get freed until 30s(garbage collection time) of
>> issuing free(eg: iobuf_put). As a result of this the memory consumption of
>> the process appears to increase for brick mux. Reducing the number of
>> threads to <100 [2] will solve this issue. To prove this theory, if we add
>> 30sec delay between each volume create in mpx-restart-crash, the mem
>> consumption is:
>>
>> With this patch after adding 30s delay between create volume:
>>                    total        used       free      shared  buff/cache
>> available
>> Mem:           3789        1344      947         488        1497
>> 1606
>> Swap:          3071           0        3071
>>
>> With this patch:
>>                     total        used        free      shared
>> buff/cache   available
>> Mem:           3789        1710         840         235
>> 1238        1494
>> Swap:          3071           0           3071
>>
>> Without this patch:
>>                    total        used        free      shared
>> buff/cache   available
>> Mem:           3789        1413      969         355        1406
>> 1668
>> Swap:          3071           0        3071
>>
>> Regards,
>> Poornima
>>
>> [1] https://review.gluster.org/#/c/glusterfs/+/20362/
>> [2] https://github.com/gluster/glusterfs/issues/475
>>
>> On Thu, Dec 20, 2018 at 10:28 AM Amar Tumballi <atumball at redhat.com>
>> wrote:
>>
>>> Since yesterday at least 10+ patches have failed regression on ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>>>
>>>
>>> Help to debug them soon would be appreciated.
>>>
>>>
>>> Regards,
>>>
>>> Amar
>>>
>>>
>>> --
>>> Amar Tumballi (amarts)
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>
> --
> Amar Tumballi (amarts)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20181220/78dc90aa/attachment-0001.html>


More information about the Gluster-devel mailing list