[Bugs] [Bug 1580352] Glusterd memory leaking in gf_gld_mt_linebuf

Mon May 21 12:24:09 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1580352

--- Comment #3 from Kotresh HR <khiremat at redhat.com> ---
Description of problem:

Four node cluster.

In all four hosts, the glusterd process has a memory leak.

Looking at the ps output, the resident set size of the process is 1.6 GB on the
QA nodes
==> Here, the process is consuming 1.6 GB and is taking nearly 14% of the
memory:

- sosreport glusteredc1fs2uq.owfg.com

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     20590  0.5  6.2 1681632 753284 ?      Ssl  Apr27  41:38
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

- sosreport glusterldc1fs1up.owfg.com

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     18199  1.8  7.5 1999648 1230012 ?     Ssl  May01  20:14
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/9920bccf2a4c92d44d9f991404c5765d.socket

- sosreport glusterldc1fs2up.owfg.com

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      9102  3.5 36.7 8990320 5975468 ?     Ssl  Feb14 3927:15
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

==> On this gluster node, it's taking 36% of memory and it's consuming nearly 6
GB.

Node glusteredc1fs1uq.owfg.com

glusterdump.1573.dump.1525458209

Looking at the highest memory size:

[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=909495296 --> 909 MB leaked here
num_allocs=888179
max_size=909495296
max_num_allocs=888179
total_allocs=888179

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=170826728  --> 170 MB leaked here
num_allocs=1607174
max_size=170839680
max_num_allocs=1607398
total_allocs=80039329

Same thing for the second iteration - the same structures keep growing: 

glusterdump.1573.dump.1525466693

[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=919127040  --> On this second iteration we have 919 MB
num_allocs=897585
max_size=919127040
max_num_allocs=897585
total_allocs=897585

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=172089816
num_allocs=1619086
max_size=172099544
max_num_allocs=1619240
total_allocs=80707302

Identical results for node glusteredc1fs2uq.owfg.com

glusterdump.20590.dump.1525458352

mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=476495872
num_allocs=465328
max_size=476495872  --> 476 MB leaked here:
max_num_allocs=465328
total_allocs=465328

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=70239188  --> 70 MB here
num_allocs=627665
max_size=70284104
max_num_allocs=628212
total_allocs=86062168

glusterdump.20590.dump.1525466708

[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=485989376
num_allocs=474599
max_size=485989376  --> On the second iteration, the memory has increased on
485 MB
max_num_allocs=474599
total_allocs=474599

[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=71332824
num_allocs=637632
max_size=71335796
max_num_allocs=637669
total_allocs=87385904

The only place where I can find such allocation is in geo-replication code:

https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-geo-rep.c

Exactly here:

glusterd_urltransform

...

  for (;;) {
                size_t len;
                line = GF_MALLOC (1024, gf_gld_mt_linebuf);
                if (!line) {
                        error = _gf_true;
                        goto out;
                }

...

I believe this is caused by geo-replication. Further assistance from
engineering is required to understand the source of this memory leak.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=gLQPmlE0RY&a=cc_unsubscribe