[Bugs] [Bug 1329335] New: GlusterFS - Memory Leak - High Memory Utilization

Thu Apr 21 16:06:30 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1329335

            Bug ID: 1329335
           Summary: GlusterFS - Memory Leak - High Memory Utilization
           Product: GlusterFS
           Version: 3.7.11
         Component: glusterd
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: uganit at gmail.com
                CC: bugs at gluster.org

Created attachment 1149509
  --> https://bugzilla.redhat.com/attachment.cgi?id=1149509&action=edit
GlusterFS dump file

We are using GlusterFS 3.7.11 (upgraded from 3.7.6 last week) on RHEL 7.x in
AWS EC2.

We continue to see memory utilization going up once every 2 days. The memory
utilization of the server demon(glusterd) in  NFS server is keep on increasing.
In about 30+ hours the Memory utilization of glusterd service alone will reach
70% of memory available. Since we have alarms for this threshold, we get
notified and only way to stop it so far is to restart the glusterd. 

This happens even where there’s not a lot of load in GlusterFS.

The GlusterFS is configured in the two server nodes with two mount location.

$ df -i
Filesystem        Inodes  IUsed     IFree IUse% Mounted on

/dev/xvdf      125829120 120186 125708934    1% /nfs_app1
/dev/xvdg      125829120 142937 125686183    1% /nfs_app2

As part of debugging, we tried the following:

1.    From the client side, in the mount location, we tried to read and write
around 1000 files (each of 4MB size). There was no marked spike in memory
utilization during this time. 
2.    We were using GlusterFS 3.7.6 and moved to 3.7.11 and despite that the
problem persists.
3.    We created a dump of the volume in question. The dump file is attached.
Some of memory allocations such as gf_common_mt_asptinlf_memoryusage has huge
total_allocs. Specifically 3 of them that are listed below.

[global.glusterfs - usage-type gf_common_mt_asprintf memusage]
size=260
num_allocs=12
max_size=2464
max_num_allocs=294
total_allocs=927964

[global.glusterfs - usage-type gf_common_mt_char memusage]
size=6388
num_allocs=164
max_size=30134
max_num_allocs=645
total_allocs=1424017

[protocol/server.xyz-server - usage-type gf_common_mt_strdup memusage]
size=26055
num_allocs=2795
max_size=27198
max_num_allocs=2828
total_allocs=135503

4.    We also noticed that the mempool has nr_files as a negative number. Not
sure if this is also a cause of the problem.

[mempool]
[storage/posix.xyz-posix]
base_path=/nfs_xyz/abc
base_path_length=25
max_read=44215866
max_write=104925485
nr_files=-418

This is happening in Prod and as expected generates a lot of problems. 
Has anybody seen this before? Any insights into what we can try would be
greatly appreciated.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.