[Bugs] [Bug 1718734] Memory leak in glusterfsd process

Mon Jun 10 06:24:50 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1718734

--- Comment #6 from Abhishek <abhishpaliwal at gmail.com> ---
Attached are some statedumps taken on the gluster fs server. An initial dump,
then one from an hour or so later, then one from another 3 hours or so. I
believe we have looked before at the statedumps though and not seen any
evidence there what is going wrong, but please double-check this.

I have a system running with gluster 5.4 setup in replicate mode. So an Active
and a Passive server, and one client that has mounted the gluster volume, and
simply writes and deletes a file every 15 minutes to the gluster volume. That
is all that is going on.

What we see is that the memory usage for glusterfsd process is increasing
slowly in a liner fashion. I am running a python script every minute to log the
memory usage, and then plot the result on a graph. I attach the graph showing
glusterfsd private, shared and total memory usage over time (some 78 days
running). I also attach two screenshots from 'top' taken at various stages.

This is the graph during the one file every 15 minutes write test:

Please see the attachment for image.png

And BTW, if we 'hammer' the gluster volume with file writes/deletes in a much
faster fashion, ie many files written/deleted every second, or even every
minute, we see that the glusterfsd memory usage increases only for a very short
period, then it levels off and stays level forever at around 35MB total. So
there is clearly something different happening in the 'slow' file access case
where the total is at nearly 200MB and still increasing. 

If we run Valgrind, we see that memory allocates are freed up when the process
is ended, but the problem we have is that this will be on a system where
gluster is up and running all the time. So there seems to be a problem that
memory is dynamically allocated each time there is a write/read on the gluster
volume, but it is not dynamically freed in runtime. The worry is at some point
glusterfsd will completely use up all the memory on the system - might take a
long time but this is not acceptable.

My steps are here:

root at board0:/tmp# gluster --version
glusterfs 5.4

root at board0:/tmp# gluster volume status gv0
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick board0:/export/sdb1/brick 49152 0 Y 1702
Brick board1:/export/sdb1/brick 49152 0 Y 1652
Self-heal Daemon on localhost N/A N/A Y 1725
Self-heal Daemon on board1 N/A N/A Y 1675
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

root at board0:/tmp# jobs
[1]+ Running ./ps_mem.py -w 61 > /tmp/ps_mem.log & (wd: ~)

root at board0:/tmp# ps -ef | grep gluster
root 1608 1 0 May08 ? 00:00:04 /usr/sbin/glusterd -p /var/run/glusterd.pid
root 1702 1 0 May08 ? 00:00:14 /usr/sbin/glusterfsd -s board0 --volfile-id
gv0.board0.export-sdb1-brick -p
/var/run/gluster/vols/gv0/board0-export-sdb1-brick.pid -S
/var/run/gluster/6c09da8ec6e017c8.socket --brick-name /export/sdb1/brick -l
/var/log/glusterfs/bricks/export-sdb1-brick.log --xlator-option
*-posix.glusterd-uuid=336dc4a8-1371-4366-b2f9-003c35e12ca1 --process-name brick
--brick-port 49152 --xlator-option gv0-server.listen-port=49152
root 1725 1 0 May08 ? 00:00:03 /usr/sbin/glusterfs -s localhost --volfile-id
gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S /var/run/gluster/7ae70daf2745f7d4.socket
--xlator-option replicate.node-uuid=336dc4a8-1371-4366-b2f9-003c35e12ca1
--process-name glustershd
root 3115 1241 0 03:00 ttyS0 00:00:00 grep gluster 

This is the cmd used to create the gluster volume:

gluster volume create gv0 replica 2 board0:/export/sdb1/brick
board1:/export/sdb1/brick

And on client I do like:

mount -t glusterfs board0:gv0 /mnt

and then just run the one file each 15 min test

./test_v4.sh

To get the data, I run like this after some time:

grep glusterfsd ps_mem.log | awk '{ print $1 "," $4 "," $7 }' >
gluster54-glusterfsd.csv

Then plot the points in excel

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.