[Gluster-devel] 3.5.0beta3 memory leak problem

Fri Feb 21 04:01:41 UTC 2014

Hi Vijay,

I make following test:
Start glusterfs volume, kill glusterfsd, and start glusterfsd with
following command:

valgrind --log-file=/root/dingyuan/logs/valgrind.log /usr/sbin/glusterfsd
-s server241 --volfile-id vol1.server241.fsmnt-fs1 -p
/var/lib/glusterd/vols/vol1/run/server241-fsmnt-fs1.pid -S
/var/run/4f8241255dc7142a794af68d66dcedeb.socket --brick-name /fsmnt/fs1 -l
/var/log/glusterfs/bricks/fsmnt-fs1.log --xlator-option
*-posix.glusterd-uuid=41da2eae-c2c8-41a0-8873-5286699a8b95 --brick-port
49153 --xlator-option vol1-server.listen-port=49153 -N

The command line option is the same with default command line option except
the red region.
Then mount nfs client, run ltp test.
After a few minutes, valgrind seems run into a dead loop. top shows
below:(glusterfsd run in the process 'memcheck-amd64-')

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21255 root      20   0  309m 106m 4328 R 100.1  1.4   1121:42
memcheck-amd64-

The process can not be killed by SIGTERM. SIGKILL can kill it, but no
valgrind report generated...

Is there something wrong with my test procedure. Or is there other method
to catch more information?

Thanks!

On Wed, Feb 19, 2014 at 2:20 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> On 02/18/2014 03:18 PM, Yuan Ding wrote:
>
>> I tested gluster nfs server with 1 nfs client. And run ltp's fs test
>> cases on that nfs client. There seems to have 2 memory leak problem.
>> (See my nfs server & 2 glusterfsd config file is in attach)
>> The 2 problem describes below:
>>
>> 1. The glusterfs runs as nfs server exhaust system memory(1GB) in server
>> minutes. After disable drc, this problem no longer exist.
>>
>> 2. After disable drc, the test run 1 day with no problem. But I found
>> glusterfsd used more than 50% system memory(ps command line output sees
>> below). Stop the test can not release memory.
>>
>> [root at server155 ~]# ps aux | grep glusterfsd
>> root      7443  3.7 52.8 1731340 539108 ?      Ssl  Feb17  70:01
>> /usr/sbin/glusterfsd -s server155 --volfile-id vol1.server155.fsmnt-fs1
>> -p /var/lib/glusterd/vols/vol1/run/server155-fsmnt-fs1.pid -S
>> /var/run/5b7fe23f0aec78ffa0e6968dece0a8b0.socket --brick-name /fsmnt/fs1
>> -l /var/log/glusterfs/bricks/fsmnt-fs1.log --xlator-option
>> *-posix.glusterd-uuid=d4f3d342-dd41-4dc7-b0fc-d3ce9998d21f --brick-port
>> 49152 --xlator-option vol1-server.listen-port=49152
>>
>> I use kill -SIGUSR1 7443 to collected some dump information(in attached
>> fsmnt-fs1.7443.dump.1392711830).
>>
>> Any help is appreciate!
>>
>
> Thanks for the report, there seem to be a lot of dict_t allocations as
> seen from statedump. Would it be possible to run the tests after starting
> glusterfsd with valgrind and share the report here?
>
> -Vijay
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140221/fe66880e/attachment-0001.html>