[Bugs] [Bug 1580352] [GSS] Glusterd memory leaking in gf_gld_mt_linebuf
bugzilla at redhat.com
bugzilla at redhat.com
Mon May 21 10:53:29 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1580352
Sanju <srakonde at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|bugs at gluster.org |srakonde at redhat.com
--- Comment #2 from Sanju <srakonde at redhat.com> ---
Description of problem:
This customer has two pairs of gluster nodes running RHGS 3.2:
- Production gluster nodes:
glusterldc1fs1up.owfg.com
glusterldc1fs2up.owfg.com
- QA gluster nodes:
glusteredc1fs1uq.owfg.com
glusteredc1fs2uq.owfg.com
Gluster packages installed:
grep -i gluster installed-rpms
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 Fri Sep 22 08:15:35
2017
gluster-nagios-common-0.2.4-1.el7rhgs.noarch Wed Aug 23 14:16:28
2017
glusterfs-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41
2017
glusterfs-api-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41
2017
glusterfs-cli-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41
2017
glusterfs-client-xlators-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41
2017
glusterfs-fuse-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41
2017
glusterfs-geo-replication-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:42
2017
glusterfs-libs-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41
2017
glusterfs-server-3.8.4-18.6.el7rhgs.x86_64 Fri Sep 22 08:19:41
2017
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64 Fri Sep 22
08:15:41 2017
python-gluster-3.8.4-18.6.el7rhgs.noarch Fri Sep 22 08:19:42
2017
samba-vfs-glusterfs-4.4.5-3.el7rhgs.x86_64 Wed Aug 23 14:18:11
2017
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch Fri Sep 22 08:15:44
2017
In all four hosts, the glusterd process has a memory leak.
Looking at the ps output, the resident set size of the process is 1.6 GB on the
QA nodes and nearly 5 GB on production nodes.
- sosreport glusteredc1fs1uq.owfg.com
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1573 0.8 13.8 2842496 1679636 ? Ssl Apr20 154:28
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
==> Here, the process is consuming 1.6 GB and is taking nearly 14% of the
memory:
- sosreport glusteredc1fs2uq.owfg.com
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 20590 0.5 6.2 1681632 753284 ? Ssl Apr27 41:38
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
- sosreport glusterldc1fs1up.owfg.com
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 18199 1.8 7.5 1999648 1230012 ? Ssl May01 20:14
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/9920bccf2a4c92d44d9f991404c5765d.socket
- sosreport glusterldc1fs2up.owfg.com
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 9102 3.5 36.7 8990320 5975468 ? Ssl Feb14 3927:15
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
==> On this gluster node, it's taking 36% of memory and it's consuming nearly 6
GB.
I've asked the customer for two statedumps of the glusterd process, taken two
hours apart. He has provided this information only from the QA nodes.
Node glusteredc1fs1uq.owfg.com
glusterdump.1573.dump.1525458209
Looking at the highest memory size:
[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=909495296 --> 909 MB leaked here
num_allocs=888179
max_size=909495296
max_num_allocs=888179
total_allocs=888179
[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=170826728 --> 170 MB leaked here
num_allocs=1607174
max_size=170839680
max_num_allocs=1607398
total_allocs=80039329
Same thing for the second iteration - the same structures keep growing:
glusterdump.1573.dump.1525466693
[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=919127040 --> On this second iteration we have 919 MB
num_allocs=897585
max_size=919127040
max_num_allocs=897585
total_allocs=897585
[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=172089816
num_allocs=1619086
max_size=172099544
max_num_allocs=1619240
total_allocs=80707302
Identical results for node glusteredc1fs2uq.owfg.com
glusterdump.20590.dump.1525458352
mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=476495872
num_allocs=465328
max_size=476495872 --> 476 MB leaked here:
max_num_allocs=465328
total_allocs=465328
[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=70239188 --> 70 MB here
num_allocs=627665
max_size=70284104
max_num_allocs=628212
total_allocs=86062168
glusterdump.20590.dump.1525466708
[mgmt/glusterd.management - usage-type gf_gld_mt_linebuf memusage]
size=485989376
num_allocs=474599
max_size=485989376 --> On the second iteration, the memory has increased on
485 MB
max_num_allocs=474599
total_allocs=474599
[mgmt/glusterd.management - usage-type gf_common_mt_mem_pool memusage]
size=71332824
num_allocs=637632
max_size=71335796
max_num_allocs=637669
total_allocs=87385904
The only place where I can find such allocation is in geo-replication code:
https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-geo-rep.c
Exactly here:
glusterd_urltransform
...
for (;;) {
size_t len;
line = GF_MALLOC (1024, gf_gld_mt_linebuf);
if (!line) {
error = _gf_true;
goto out;
}
...
I believe this is caused by geo-replication. Further assistance from
engineering is required to understand the source of this memory leak.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list