[Bugs] [Bug 1352854] New: GlusterFS - Memory Leak - High Memory Utilization

Tue Jul 5 09:29:34 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1352854

            Bug ID: 1352854
           Summary: GlusterFS - Memory Leak - High Memory Utilization
           Product: GlusterFS
           Version: mainline
         Component: glusterd
          Keywords: Triaged
          Severity: medium
          Priority: urgent
          Assignee: bugs at gluster.org
          Reporter: moagrawa at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    kaushal at redhat.com, moagrawa at redhat.com,
                    rkavunga at redhat.com, uganit at gmail.com
        Depends On: 1329335

+++ This bug was initially created as a clone of Bug #1329335 +++

We are using GlusterFS 3.7.11 (upgraded from 3.7.6 last week) on RHEL 7.x in
AWS EC2.

We continue to see memory utilization going up once every 2 days. The memory
utilization of the server demon(glusterd) in  NFS server is keep on increasing.
In about 30+ hours the Memory utilization of glusterd service alone will reach
70% of memory available. Since we have alarms for this threshold, we get
notified and only way to stop it so far is to restart the glusterd. 

This happens even where there’s not a lot of load in GlusterFS.

The GlusterFS is configured in the two server nodes with two mount location.

$ df -i
Filesystem        Inodes  IUsed     IFree IUse% Mounted on

/dev/xvdf      125829120 120186 125708934    1% /nfs_app1
/dev/xvdg      125829120 142937 125686183    1% /nfs_app2

As part of debugging, we tried the following:

1.    From the client side, in the mount location, we tried to read and write
around 1000 files (each of 4MB size). There was no marked spike in memory
utilization during this time. 
2.    We were using GlusterFS 3.7.6 and moved to 3.7.11 and despite that the
problem persists.
3.    We created a dump of the volume in question. The dump file is attached.
Some of memory allocations such as gf_common_mt_asptinlf_memoryusage has huge
total_allocs. Specifically 3 of them that are listed below.

[global.glusterfs - usage-type gf_common_mt_asprintf memusage]
size=260
num_allocs=12
max_size=2464
max_num_allocs=294
total_allocs=927964

[global.glusterfs - usage-type gf_common_mt_char memusage]
size=6388
num_allocs=164
max_size=30134
max_num_allocs=645
total_allocs=1424017

[protocol/server.xyz-server - usage-type gf_common_mt_strdup memusage]
size=26055
num_allocs=2795
max_size=27198
max_num_allocs=2828
total_allocs=135503

4.    We also noticed that the mempool has nr_files as a negative number. Not
sure if this is also a cause of the problem.

[mempool]
[storage/posix.xyz-posix]
base_path=/nfs_xyz/abc
base_path_length=25
max_read=44215866
max_write=104925485
nr_files=-418

This is happening in Prod and as expected generates a lot of problems. 
Has anybody seen this before? Any insights into what we can try would be
greatly appreciated.

--- Additional comment from Kaushal on 2016-04-25 03:37:03 EDT ---

Hi Nagendra,

Could you provide the statedumps of the GlusterD process? The dumps you've
provided are of the brick processes (ie. glusterfsd).

You can get the statedump of the the GlusterD process by sending is a SIGUSR1
signal.
`kill -SIGUSR1 <pid of glusterd>`.
The statedump files will be created in /var/run/gluster.

It would be nice if you can provide statedumps from two different times, so
that we can compare what changed.

--- Additional comment from Nagendran on 2016-04-25 08:46 EDT ---

--- Additional comment from Nagendran on 2016-04-25 08:46 EDT ---

--- Additional comment from Nagendran on 2016-04-25 08:48:29 EDT ---

Hi Kaushal,

    Thanks for your response.

    As you suggested, We have taken statedumps in different time interval, when
we taken first dump, the memory utilization of glusterd is 5.6% , after one
hour its increased by 7.6%. So both these dumps have been attached for your
reference.

Note: The time between the dumps is 1hr.

     For any more information on this, please let us know.

Thanks,
Nagendran N

--- Additional comment from Atin Mukherjee on 2016-04-25 11:24:51 EDT ---

What all commands have been run? Could you also attach the cmd_history.log file
from all the nodes?

--- Additional comment from Nagendran on 2016-04-26 05:48 EDT ---

Command History for the first node.

--- Additional comment from Nagendran on 2016-04-26 05:48 EDT ---

--- Additional comment from Nagendran on 2016-04-26 05:51:01 EDT ---

Hi Kaushal,
I have attached the command history for both the nodes as part of this as
requested.

Please let us know if you have any thoughts on what's going on. 
Unfortunately, this is creating a serious problem (memory alarms in Prod
getting triggered almost every alternate day) and we are ending up restarting
during loads. 

Appreciate any pointers on the problem.

--- Additional comment from Mohammed Rafi KC on 2016-04-26 08:51:45 EDT ---

>From your cmd_history, I see your running lot of profile commands. So are you
observing memory leak when you run profile commands continuously?. I'm asking
this because that will help to figure out the memory leak path easily.

--- Additional comment from Nagendran on 2016-04-26 09:42:38 EDT ---

We are using a custom NewRelic plugin to get some metrics about the GlusterFS
peer for monitoring - basically a cron job that runs this command:

gluster volume profile ${1} info | grep -n "Brick" | sed -n 2p | cut -d":" -f 1

We have temporarily stopped the agent now and will check if this helps. Is this
what you are suspecting as the root cause?

The reason why we added the agent was to monitor the following:
I/O operation per second.
Directories activities per second
Files activities per second
Files Information activities per second
Files Latency
Directory Latency

Are there any other ways of doing it without doing the volume profile?

--- Additional comment from Kaushal on 2016-04-27 07:53:10 EDT ---

As Rafi has mentioned already, I too think it's the volume profile polling
causing issues.

>From the statedumps, I see that memory allocations for dict_t, data_t,
data_pair_t, gf_common_mt_memdup, gf_common_mt_asprintf and gf_common_mt_strdup
have increased quite a lot. These are the memory types generally associated
with the GlusterFS dictionary data type and operations on it (including
dict_serialize and unserialize).

Information in GlusterFS is passed between processes (brick to glusterd,
glusterd to glusterd and glusterd to cli) using dictionaries as containers. But
certain operations generate a large amount of data, like 'volume profile',
which make dictionaries huge. Collecting information from multiple sources in
volume profile involves a lot of data duplication happening, which uses a lot
of memory.

While in most cases, memory allocated to dictionaries should be freed upon the
dictionary being destroyed, it appears that there is a quite significant leak
in volume profile path. We'll try to identify the leak as soon as we can.

In the meantime, I do hope that stopping the agent has helped. GlusterFS
doesn't have any other way, apart from volume profile, to gather volume stats
on server. A client side tool, glusterfsiostat[1], was implemented a couple of
years back as a GSOC project. You could try it out.

If that doesn't work out, and you really need to monitor the stats, I suggest
that you reduce the polling interval. From the logs, I see that the interval is
1 minute right now, which can be changed to say 5 minutes. Also, the polling is
being done from both servers. This effectively make the polling period 30
seconds. You can just use one of the servers to get the stats, as volume
profile gathers stats of a volume from the whole cluster.

--- Additional comment from Nagendran on 2016-04-27 10:02:20 EDT ---

Hi Kaushal,
Stopping the agent helped. We have not seen the leak since then. So the theory
is right.

Thanks much for your pointer. It helped us to zero in on the plugin.

Like you said if we decrease the frequency, wouldn't we still have this problem
(just a little later)? We can try that in a lower environment - but I suspect
it may not help much.

The client side is also a problem because there are many clients and they
change as well. 

The best option seem to be identifying the way to clear the allocated memory
right after the information is read. Is there a way to do that? If so, we can
add it to the plugin so every time we read we reset the allocation.

Will that work?

--- Additional comment from Vijay Bellur on 2016-07-05 05:12:56 EDT ---

REVIEW: http://review.gluster.org/14862 (Memory leak in gluster volume profile
command) posted (#1) for review on master by MOHIT AGRAWAL
(moagrawa at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1329335
[Bug 1329335] GlusterFS - Memory Leak - High Memory Utilization
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.