[Gluster-devel] too many failures on mpx-restart-crash.t on master branch
Poornima Gurusiddaiah
pgurusid at redhat.com
Thu Dec 20 09:07:59 UTC 2018
So, this failure is related to patch [1] iobuf. Thanks to Pranith for
identifying this. This patch increases the memory consumption in the brick
mux use case(**) and causes oom kill. But it is not the problem with the
patch itself. The only way to rightly fix it is to fix the issue [2]. That
said we cannot wait until this issue is fixed, the possible work arounds
are:
- Reduce the volume creation count in test case mpx-restart-crash.t
(temporarily until [2] is fixed)
- Increase the resources(RAM to 4G?) on the regression system
- Revert the patch until [2] is completely fixed
Root Cause:
Without the iobuf patch [1], we had a pre allocated pool of min size
12.5MB(which can grow), in many cases this entire size may not be
completely used. Hence we moved to per thread mem pool for iobuf as well.
With this we expect the memory consumption of the processes to go down, and
it did go down.After creating 20 volumes on the system, the free -m output:
With this patch:
total used free shared buff/cache
available
Mem: 3789 2198 290 249 1300
968
Swap: 3071 0 3071
Without this patch:
total used free shared buff/cache
available
Mem: 3789 2280 115 488 1393
647
Swap: 3071 0 3071
This output can vary based on system state, workload etc. This is not
indicative of the exact amount of memory reduction, but of the fact that
the memory usage is reduced.
But, with brick mux the scenario is different. Since we use per thread mem
pool for iobuf in patch [1], the memory consumption due to iobuf increases
if the threads increases. In the current brick mux implementation, for 20
volumes(in the mpx-restart-crash test), the number of threads is 1439. And
the allocated iobufs(or any other per thread mem pool memory) doesn't get
freed until 30s(garbage collection time) of issuing free(eg: iobuf_put). As
a result of this the memory consumption of the process appears to increase
for brick mux. Reducing the number of threads to <100 [2] will solve this
issue. To prove this theory, if we add 30sec delay between each volume
create in mpx-restart-crash, the mem consumption is:
With this patch after adding 30s delay between create volume:
total used free shared buff/cache
available
Mem: 3789 1344 947 488 1497 1606
Swap: 3071 0 3071
With this patch:
total used free shared buff/cache
available
Mem: 3789 1710 840 235 1238
1494
Swap: 3071 0 3071
Without this patch:
total used free shared buff/cache
available
Mem: 3789 1413 969 355 1406 1668
Swap: 3071 0 3071
Regards,
Poornima
[1] https://review.gluster.org/#/c/glusterfs/+/20362/
[2] https://github.com/gluster/glusterfs/issues/475
On Thu, Dec 20, 2018 at 10:28 AM Amar Tumballi <atumball at redhat.com> wrote:
> Since yesterday at least 10+ patches have failed regression on ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>
>
> Help to debug them soon would be appreciated.
>
>
> Regards,
>
> Amar
>
>
> --
> Amar Tumballi (amarts)
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20181220/29faf2c2/attachment-0001.html>
More information about the Gluster-devel
mailing list