[Gluster-users] Fuse memleaks, all versions → OOM-killer

Tue Aug 2 17:15:31 UTC 2016

In order to prevent too many swap usage I removed swap on this machine 
(swapoff -a).
Memory usage was still growing.
After that I started an other program that takes memory (in order to 
accelerate things) and I got the OOM-killer.

Here is the syslog:
[1246854.291996] Out of memory: Kill process 931 (glusterfs) score 742 
or sacrifice child
[1246854.292102] Killed process 931 (glusterfs) total-vm:3527624kB, 
anon-rss:3100328kB, file-rss:0kB

Last VSZ/RSS was: 3527624 / 3097096

Here is the rest of the OOM-killer data:
[1246854.291847] active_anon:600785 inactive_anon:377188 isolated_anon:0
  active_file:97 inactive_file:137 isolated_file:0
  unevictable:0 dirty:0 writeback:1 unstable:0
  free:21740 slab_reclaimable:3309 slab_unreclaimable:3728
  mapped:255 shmem:4267 pagetables:3286 bounce:0
  free_cma:0
[1246854.291851] Node 0 DMA free:15876kB min:264kB low:328kB high:396kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB 
managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB 
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? yes
[1246854.291858] lowmem_reserve[]: 0 2980 3948 3948
[1246854.291861] Node 0 DMA32 free:54616kB min:50828kB low:63532kB 
high:76240kB active_anon:1940432kB inactive_anon:1020924kB 
active_file:248kB inactive_file:260kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:3129280kB managed:3054836kB mlocked:0kB 
dirty:0kB writeback:0kB mapped:760kB shmem:14616kB 
slab_reclaimable:9660kB slab_unreclaimable:8244kB kernel_stack:1456kB 
pagetables:10056kB unstable:0kB bounce:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:803 all_unreclaimable? yes
[1246854.291865] lowmem_reserve[]: 0 0 967 967
[1246854.291867] Node 0 Normal free:16468kB min:16488kB low:20608kB 
high:24732kB active_anon:462708kB inactive_anon:487828kB 
active_file:140kB inactive_file:288kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:1048576kB managed:990356kB mlocked:0kB 
dirty:0kB writeback:4kB mapped:260kB shmem:2452kB 
slab_reclaimable:3576kB slab_unreclaimable:6668kB kernel_stack:560kB 
pagetables:3088kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:975 all_unreclaimable? yes
[1246854.291872] lowmem_reserve[]: 0 0 0 0
[1246854.291874] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 2*32kB (U) 3*64kB 
(U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (EM) 
= 15876kB
[1246854.291882] Node 0 DMA32: 1218*4kB (UEM) 848*8kB (UE) 621*16kB (UE) 
314*32kB (UEM) 189*64kB (UEM) 49*128kB (UEM) 2*256kB (E) 0*512kB 
0*1024kB 0*2048kB 1*4096kB (R) = 54616kB
[1246854.291891] Node 0 Normal: 3117*4kB (UE) 0*8kB 0*16kB 3*32kB (R) 
1*64kB (R) 2*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 
0*4096kB = 16468kB
[1246854.291900] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB
[1246854.291902] 4533 total pagecache pages
[1246854.291903] 0 pages in swap cache
[1246854.291905] Swap cache stats: add 343501, delete 343501, find 
7730690/7732743
[1246854.291906] Free swap  = 0kB
[1246854.291907] Total swap = 0kB
[1246854.291908] 1048462 pages RAM
[1246854.291909] 0 pages HighMem/MovableOnly
[1246854.291909] 14555 pages reserved
[1246854.291910] 0 pages hwpoisoned

Regards,
--
Y.

Le 02/08/2016 à 17:00, Yannick Perret a écrit :
> So here are the dumps, gzip'ed.
>
> What I did:
> 1. mounting the volume, removing all its content, umounting it
> 2. mounting the volume
> 3. performing a cp -Rp /usr/* /root/MNT
> 4. performing a rm -rf /root/MNT/*
> 5. taking a dump (glusterdump.p1.dump)
> 6. re-doing 3, 4 and 5 (glusterdump.p2.dump)
>
> VSZ/RSS are respectively:
> - 381896 / 35688 just after mount
> - 644040 / 309240 after 1st cp -Rp
> - 644040 / 310128 after 1st rm -rf
> - 709576 / 310128 after 1st kill -USR1
> - 840648 / 421964 after 2nd cp -Rp
> - 840648 / 422224 after 2nd rm -rf
>
> I created a small script that performs these actions in an infinite loop:
> while /bin/true
> do
>   cp -Rp /usr/* /root/MNT/
>   + get VSZ/RSS of glusterfs process
>   rm -rf /root/MNT/*
>   + get VSZ/RSS of glusterfs process
> done
>
> At this time here are the values so far:
> 971720 533988
> 1037256 645500
> 1037256 645840
> 1168328 757348
> 1168328 757620
> 1299400 869128
> 1299400 869328
> 1364936 980712
> 1364936 980944
> 1496008 1092384
> 1496008 1092404
> 1627080 1203796
> 1627080 1203996
> 1692616 1315572
> 1692616 1315504
> 1823688 1426812
> 1823688 1427340
> 1954760 1538716
> 1954760 1538772
> 2085832 1647676
> 2085832 1647708
> 2151368 1750392
> 2151368 1750708
> 2282440 1853864
> 2282440 1853764
> 2413512 1952668
> 2413512 1952704
> 2479048 2056500
> 2479048 2056712
>
> So at this time glusterfs process takes not far from 2Gb of resident 
> memory, only performing exactly the same actions 'cp -Rp /usr/* 
> /root/MNT' + 'rm -rf /root/MNT/*'.
>
> Swap usage is starting to increase a little, and I don't saw any 
> memory dropping at this time.
> I can understand that kernel may not release the removed files (after 
> rm -rf) immediatly, but the fist 'rm' occured at ~12:00 today and it 
> is ~17:00 here so I can't understand why so much memory is used.
> I would expect the memory to grow during 'cp -Rp', then reduce after 
> 'rm', but it stays the same. Even if it stays the same I would expect 
> it to not grow more while cp-ing again.
>
> I let the cp/rm loop running to see what will happen. Feel free to ask 
> for other data if it may help.
>
> Please note that I'll be in hollidays at the end of this week for 3 
> weeks so I will mostly not be able to perform tests during this time 
> (network connection is too bad where I go).
>
> Regards,
> --
> Y.
>
> Le 02/08/2016 à 05:11, Pranith Kumar Karampuri a écrit :
>>
>>
>> On Mon, Aug 1, 2016 at 3:40 PM, Yannick Perret 
>> <yannick.perret at liris.cnrs.fr <mailto:yannick.perret at liris.cnrs.fr>> 
>> wrote:
>>
>>     Le 29/07/2016 à 18:39, Pranith Kumar Karampuri a écrit :
>>>
>>>
>>>     On Fri, Jul 29, 2016 at 2:26 PM, Yannick Perret
>>>     <yannick.perret at liris.cnrs.fr
>>>     <mailto:yannick.perret at liris.cnrs.fr>> wrote:
>>>
>>>         Ok, last try:
>>>         after investigating more versions I found that FUSE client
>>>         leaks memory on all of them.
>>>         I tested:
>>>         - 3.6.7 client on debian 7 32bit and on debian 8 64bit (with
>>>         3.6.7 serveurs on debian 8 64bit)
>>>         - 3.6.9 client on debian 7 32bit and on debian 8 64bit (with
>>>         3.6.7 serveurs on debian 8 64bit)
>>>         - 3.7.13 client on debian 8 64bit (with 3.8.1 serveurs on
>>>         debian 8 64bit)
>>>         - 3.8.1 client on debian 8 64bit (with 3.8.1 serveurs on
>>>         debian 8 64bit)
>>>         In all cases compiled from sources, appart for 3.8.1 where
>>>         .deb were used (due to a configure runtime error).
>>>         For 3.7 it was compiled with --disable-tiering. I also tried
>>>         to compile with --disable-fusermount (no change).
>>>
>>>         In all of these cases the memory (resident & virtual) of
>>>         glusterfs process on client grows on each activity and never
>>>         reach a max (and never reduce).
>>>         "Activity" for these tests is cp -Rp and ls -lR.
>>>         The client I let grows the most overreached ~4Go RAM. On
>>>         smaller machines it ends by OOM killer killing glusterfs
>>>         process or glusterfs dying due to allocation error.
>>>
>>>         In 3.6 mem seems to grow continusly, whereas in 3.8.1 it
>>>         grows by "steps" (430400 ko → 629144 (~1min) → 762324
>>>         (~1min) → 827860…).
>>>
>>>         All tests performed on a single test volume used only by my
>>>         test client. Volume in a basic x2 replica. The only
>>>         parameters I changed on this volume (without any effect) are
>>>         diagnostics.client-log-level set to ERROR and
>>>         network.inode-lru-limit set to 1024.
>>>
>>>
>>>     Could you attach statedumps of your runs?
>>>     The following link has steps to capture
>>>     this(https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/
>>>     ). We basically need to see what are the memory types that are
>>>     increasing. If you could help find the issue, we can send the
>>>     fixes for your workload. There is a 3.8.2 release in around 10
>>>     days I think. We can probably target this issue for that?
>>     Here are statedumps.
>>     Steps:
>>     1. mount -t glusterfs ldap1.my.domain:SHARE /root/MNT/ (here VSZ
>>     and RSS are 381896 35828)
>>     2. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>>     glusterdump.n1.dump.1470042769)
>>     3. perform a 'ls -lR /root/MNT | wc -l' (btw result of wc -l is
>>     518396 :)) and a 'cp -Rp /usr/* /root/MNT/boo' (VSZ/RSS are
>>     1301536/711992 at end of these operations)
>>     4. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>>     glusterdump.n2.dump.1470043929)
>>     5. do 'cp -Rp * /root/MNT/toto/', so on an other directory
>>     (VSZ/RSS are 1432608/909968 at end of this operation)
>>     6. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>>     glusterdump.n3.dump.)
>>
>>
>> Hey,
>>       Thanks a lot for providing this information. Looking at these 
>> steps, I don't see any problem for the increase in memory. Both ls 
>> -lR and cp -Rp commands you did in the step-3 will add new inodes in 
>> memory which increase the memory. What happens is as long as the 
>> kernel thinks these inodes need to be in memory gluster keeps them in 
>> memory. Once kernel doesn't think the inode is necessary, it sends 
>> 'inode-forgets'. At this point the memory starts reducing. So it kind 
>> of depends on the memory pressure kernel is under. But you said it 
>> lead to OOM-killers on smaller machines which means there could be 
>> some leaks. Could you modify the steps as follows to check to confirm 
>> there are leaks? Please do this test on those smaller machines which 
>> lead to OOM-killers.
>>
>> Steps:
>> 1. mount -t glusterfs ldap1.my.domain:SHARE /root/MNT/ (here VSZ and 
>> RSS are 381896 35828)
>> 2. perform a 'ls -lR /root/MNT | wc -l' (btw result of wc -l is 
>> 518396 :)) and a 'cp -Rp /usr/* /root/MNT/boo' (VSZ/RSS are 
>> 1301536/711992 at end of these operations)
>> 3. do 'cp -Rp * /root/MNT/toto/', so on an other directory (VSZ/RSS 
>> are 1432608/909968 at end of this operation)
>> 4. Delete all the files and directories you created in steps 2, 3 above
>> 5. Take statedump with kill -USR1 <pid-of-glusterfs-process>
>> 6. Repeat steps from 2-5
>>
>> Attach these two statedumps. I think the statedumps will be even more 
>> affective if the mount does not have any data when you start the 
>> experiment.
>>
>> HTH
>>
>>
>>     Dump files are gzip'ed because they are very large.
>>     Dump files are here (too big for email):
>>     http://wikisend.com/download/623430/glusterdump.n1.dump.1470042769.gz
>>     http://wikisend.com/download/771220/glusterdump.n2.dump.1470043929.gz
>>     http://wikisend.com/download/428752/glusterdump.n3.dump.1470045181.gz
>>     (I keep the files if someone whats them in an other format)
>>
>>     Client and servers are installed from .deb files
>>     (glusterfs-client_3.8.1-1_amd64.deb and
>>     glusterfs-common_3.8.1-1_amd64.deb on client side).
>>     They are all Debian 8 64bit. Servers are test machines that serve
>>     only one volume to this sole client. Volume is a simple x2
>>     replica. I just changed for test network.inode-lru-limit value to
>>     1024. Mount point /root/MNT is only used for these tests.
>>
>>     --
>>     Y.
>>
>>
>>
>>
>>
>> -- 
>> Pranith
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160802/42dfa986/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160802/42dfa986/attachment.p7s>