[Gluster-users] Fuse memleaks, all versions → OOM-killer

Mon Aug 29 10:32:59 UTC 2016

Hello,

back after holidays. I don't saw any new relies after this last mail, I 
hope I don't missed mails (too many mails to parse…).

BTW it seems that my problem is very similar to this opened bug: 
https://bugzilla.redhat.com/show_bug.cgi?id=1369364
-> memory usage always increasing for (here) read ops until reaching all 
mem/swap, using the fuse client.

Regards,
--
Y.

Le 02/08/2016 à 19:15, Yannick Perret a écrit :
> In order to prevent too many swap usage I removed swap on this machine 
> (swapoff -a).
> Memory usage was still growing.
> After that I started an other program that takes memory (in order to 
> accelerate things) and I got the OOM-killer.
>
> Here is the syslog:
> [1246854.291996] Out of memory: Kill process 931 (glusterfs) score 742 
> or sacrifice child
> [1246854.292102] Killed process 931 (glusterfs) total-vm:3527624kB, 
> anon-rss:3100328kB, file-rss:0kB
>
> Last VSZ/RSS was: 3527624 / 3097096
>
>
> Here is the rest of the OOM-killer data:
> [1246854.291847] active_anon:600785 inactive_anon:377188 isolated_anon:0
>  active_file:97 inactive_file:137 isolated_file:0
>  unevictable:0 dirty:0 writeback:1 unstable:0
>  free:21740 slab_reclaimable:3309 slab_unreclaimable:3728
>  mapped:255 shmem:4267 pagetables:3286 bounce:0
>  free_cma:0
> [1246854.291851] Node 0 DMA free:15876kB min:264kB low:328kB 
> high:396kB active_anon:0kB inactive_anon:0kB active_file:0kB 
> inactive_file:0kB unevictable:0kB isolated(anon):0kB 
> isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB 
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB 
> bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? yes
> [1246854.291858] lowmem_reserve[]: 0 2980 3948 3948
> [1246854.291861] Node 0 DMA32 free:54616kB min:50828kB low:63532kB 
> high:76240kB active_anon:1940432kB inactive_anon:1020924kB 
> active_file:248kB inactive_file:260kB unevictable:0kB 
> isolated(anon):0kB isolated(file):0kB present:3129280kB 
> managed:3054836kB mlocked:0kB dirty:0kB writeback:0kB mapped:760kB 
> shmem:14616kB slab_reclaimable:9660kB slab_unreclaimable:8244kB 
> kernel_stack:1456kB pagetables:10056kB unstable:0kB bounce:0kB 
> free_cma:0kB writeback_tmp:0kB pages_scanned:803 all_unreclaimable? yes
> [1246854.291865] lowmem_reserve[]: 0 0 967 967
> [1246854.291867] Node 0 Normal free:16468kB min:16488kB low:20608kB 
> high:24732kB active_anon:462708kB inactive_anon:487828kB 
> active_file:140kB inactive_file:288kB unevictable:0kB 
> isolated(anon):0kB isolated(file):0kB present:1048576kB 
> managed:990356kB mlocked:0kB dirty:0kB writeback:4kB mapped:260kB 
> shmem:2452kB slab_reclaimable:3576kB slab_unreclaimable:6668kB 
> kernel_stack:560kB pagetables:3088kB unstable:0kB bounce:0kB 
> free_cma:0kB writeback_tmp:0kB pages_scanned:975 all_unreclaimable? yes
> [1246854.291872] lowmem_reserve[]: 0 0 0 0
> [1246854.291874] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 2*32kB (U) 3*64kB 
> (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB 
> (EM) = 15876kB
> [1246854.291882] Node 0 DMA32: 1218*4kB (UEM) 848*8kB (UE) 621*16kB 
> (UE) 314*32kB (UEM) 189*64kB (UEM) 49*128kB (UEM) 2*256kB (E) 0*512kB 
> 0*1024kB 0*2048kB 1*4096kB (R) = 54616kB
> [1246854.291891] Node 0 Normal: 3117*4kB (UE) 0*8kB 0*16kB 3*32kB (R) 
> 1*64kB (R) 2*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 
> 0*4096kB = 16468kB
> [1246854.291900] Node 0 hugepages_total=0 hugepages_free=0 
> hugepages_surp=0 hugepages_size=2048kB
> [1246854.291902] 4533 total pagecache pages
> [1246854.291903] 0 pages in swap cache
> [1246854.291905] Swap cache stats: add 343501, delete 343501, find 
> 7730690/7732743
> [1246854.291906] Free swap  = 0kB
> [1246854.291907] Total swap = 0kB
> [1246854.291908] 1048462 pages RAM
> [1246854.291909] 0 pages HighMem/MovableOnly
> [1246854.291909] 14555 pages reserved
> [1246854.291910] 0 pages hwpoisoned
>
> Regards,
> --
> Y.
>
>
>
> Le 02/08/2016 à 17:00, Yannick Perret a écrit :
>> So here are the dumps, gzip'ed.
>>
>> What I did:
>> 1. mounting the volume, removing all its content, umounting it
>> 2. mounting the volume
>> 3. performing a cp -Rp /usr/* /root/MNT
>> 4. performing a rm -rf /root/MNT/*
>> 5. taking a dump (glusterdump.p1.dump)
>> 6. re-doing 3, 4 and 5 (glusterdump.p2.dump)
>>
>> VSZ/RSS are respectively:
>> - 381896 / 35688 just after mount
>> - 644040 / 309240 after 1st cp -Rp
>> - 644040 / 310128 after 1st rm -rf
>> - 709576 / 310128 after 1st kill -USR1
>> - 840648 / 421964 after 2nd cp -Rp
>> - 840648 / 422224 after 2nd rm -rf
>>
>> I created a small script that performs these actions in an infinite loop:
>> while /bin/true
>> do
>>   cp -Rp /usr/* /root/MNT/
>>   + get VSZ/RSS of glusterfs process
>>   rm -rf /root/MNT/*
>>   + get VSZ/RSS of glusterfs process
>> done
>>
>> At this time here are the values so far:
>> 971720 533988
>> 1037256 645500
>> 1037256 645840
>> 1168328 757348
>> 1168328 757620
>> 1299400 869128
>> 1299400 869328
>> 1364936 980712
>> 1364936 980944
>> 1496008 1092384
>> 1496008 1092404
>> 1627080 1203796
>> 1627080 1203996
>> 1692616 1315572
>> 1692616 1315504
>> 1823688 1426812
>> 1823688 1427340
>> 1954760 1538716
>> 1954760 1538772
>> 2085832 1647676
>> 2085832 1647708
>> 2151368 1750392
>> 2151368 1750708
>> 2282440 1853864
>> 2282440 1853764
>> 2413512 1952668
>> 2413512 1952704
>> 2479048 2056500
>> 2479048 2056712
>>
>> So at this time glusterfs process takes not far from 2Gb of resident 
>> memory, only performing exactly the same actions 'cp -Rp /usr/* 
>> /root/MNT' + 'rm -rf /root/MNT/*'.
>>
>> Swap usage is starting to increase a little, and I don't saw any 
>> memory dropping at this time.
>> I can understand that kernel may not release the removed files (after 
>> rm -rf) immediatly, but the fist 'rm' occured at ~12:00 today and it 
>> is ~17:00 here so I can't understand why so much memory is used.
>> I would expect the memory to grow during 'cp -Rp', then reduce after 
>> 'rm', but it stays the same. Even if it stays the same I would expect 
>> it to not grow more while cp-ing again.
>>
>> I let the cp/rm loop running to see what will happen. Feel free to 
>> ask for other data if it may help.
>>
>> Please note that I'll be in hollidays at the end of this week for 3 
>> weeks so I will mostly not be able to perform tests during this time 
>> (network connection is too bad where I go).
>>
>> Regards,
>> --
>> Y.
>>
>> Le 02/08/2016 à 05:11, Pranith Kumar Karampuri a écrit :
>>>
>>>
>>> On Mon, Aug 1, 2016 at 3:40 PM, Yannick Perret 
>>> <yannick.perret at liris.cnrs.fr <mailto:yannick.perret at liris.cnrs.fr>> 
>>> wrote:
>>>
>>>     Le 29/07/2016 à 18:39, Pranith Kumar Karampuri a écrit :
>>>>
>>>>
>>>>     On Fri, Jul 29, 2016 at 2:26 PM, Yannick Perret
>>>>     <yannick.perret at liris.cnrs.fr
>>>>     <mailto:yannick.perret at liris.cnrs.fr>> wrote:
>>>>
>>>>         Ok, last try:
>>>>         after investigating more versions I found that FUSE client
>>>>         leaks memory on all of them.
>>>>         I tested:
>>>>         - 3.6.7 client on debian 7 32bit and on debian 8 64bit
>>>>         (with 3.6.7 serveurs on debian 8 64bit)
>>>>         - 3.6.9 client on debian 7 32bit and on debian 8 64bit
>>>>         (with 3.6.7 serveurs on debian 8 64bit)
>>>>         - 3.7.13 client on debian 8 64bit (with 3.8.1 serveurs on
>>>>         debian 8 64bit)
>>>>         - 3.8.1 client on debian 8 64bit (with 3.8.1 serveurs on
>>>>         debian 8 64bit)
>>>>         In all cases compiled from sources, appart for 3.8.1 where
>>>>         .deb were used (due to a configure runtime error).
>>>>         For 3.7 it was compiled with --disable-tiering. I also
>>>>         tried to compile with --disable-fusermount (no change).
>>>>
>>>>         In all of these cases the memory (resident & virtual) of
>>>>         glusterfs process on client grows on each activity and
>>>>         never reach a max (and never reduce).
>>>>         "Activity" for these tests is cp -Rp and ls -lR.
>>>>         The client I let grows the most overreached ~4Go RAM. On
>>>>         smaller machines it ends by OOM killer killing glusterfs
>>>>         process or glusterfs dying due to allocation error.
>>>>
>>>>         In 3.6 mem seems to grow continusly, whereas in 3.8.1 it
>>>>         grows by "steps" (430400 ko → 629144 (~1min) → 762324
>>>>         (~1min) → 827860…).
>>>>
>>>>         All tests performed on a single test volume used only by my
>>>>         test client. Volume in a basic x2 replica. The only
>>>>         parameters I changed on this volume (without any effect)
>>>>         are diagnostics.client-log-level set to ERROR and
>>>>         network.inode-lru-limit set to 1024.
>>>>
>>>>
>>>>     Could you attach statedumps of your runs?
>>>>     The following link has steps to capture
>>>>     this(https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/
>>>>     ). We basically need to see what are the memory types that are
>>>>     increasing. If you could help find the issue, we can send the
>>>>     fixes for your workload. There is a 3.8.2 release in around 10
>>>>     days I think. We can probably target this issue for that?
>>>     Here are statedumps.
>>>     Steps:
>>>     1. mount -t glusterfs ldap1.my.domain:SHARE /root/MNT/ (here VSZ
>>>     and RSS are 381896 35828)
>>>     2. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>>>     glusterdump.n1.dump.1470042769)
>>>     3. perform a 'ls -lR /root/MNT | wc -l' (btw result of wc -l is
>>>     518396 :)) and a 'cp -Rp /usr/* /root/MNT/boo' (VSZ/RSS are
>>>     1301536/711992 at end of these operations)
>>>     4. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>>>     glusterdump.n2.dump.1470043929)
>>>     5. do 'cp -Rp * /root/MNT/toto/', so on an other directory
>>>     (VSZ/RSS are 1432608/909968 at end of this operation)
>>>     6. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>>>     glusterdump.n3.dump.)
>>>
>>>
>>> Hey,
>>>       Thanks a lot for providing this information. Looking at these 
>>> steps, I don't see any problem for the increase in memory. Both ls 
>>> -lR and cp -Rp commands you did in the step-3 will add new inodes in 
>>> memory which increase the memory. What happens is as long as the 
>>> kernel thinks these inodes need to be in memory gluster keeps them 
>>> in memory. Once kernel doesn't think the inode is necessary, it 
>>> sends 'inode-forgets'. At this point the memory starts reducing. So 
>>> it kind of depends on the memory pressure kernel is under. But you 
>>> said it lead to OOM-killers on smaller machines which means there 
>>> could be some leaks. Could you modify the steps as follows to check 
>>> to confirm there are leaks? Please do this test on those smaller 
>>> machines which lead to OOM-killers.
>>>
>>> Steps:
>>> 1. mount -t glusterfs ldap1.my.domain:SHARE /root/MNT/ (here VSZ and 
>>> RSS are 381896 35828)
>>> 2. perform a 'ls -lR /root/MNT | wc -l' (btw result of wc -l is 
>>> 518396 :)) and a 'cp -Rp /usr/* /root/MNT/boo' (VSZ/RSS are 
>>> 1301536/711992 at end of these operations)
>>> 3. do 'cp -Rp * /root/MNT/toto/', so on an other directory (VSZ/RSS 
>>> are 1432608/909968 at end of this operation)
>>> 4. Delete all the files and directories you created in steps 2, 3 above
>>> 5. Take statedump with kill -USR1 <pid-of-glusterfs-process>
>>> 6. Repeat steps from 2-5
>>>
>>> Attach these two statedumps. I think the statedumps will be even 
>>> more affective if the mount does not have any data when you start 
>>> the experiment.
>>>
>>> HTH
>>>
>>>
>>>     Dump files are gzip'ed because they are very large.
>>>     Dump files are here (too big for email):
>>>     http://wikisend.com/download/623430/glusterdump.n1.dump.1470042769.gz
>>>     http://wikisend.com/download/771220/glusterdump.n2.dump.1470043929.gz
>>>     http://wikisend.com/download/428752/glusterdump.n3.dump.1470045181.gz
>>>     (I keep the files if someone whats them in an other format)
>>>
>>>     Client and servers are installed from .deb files
>>>     (glusterfs-client_3.8.1-1_amd64.deb and
>>>     glusterfs-common_3.8.1-1_amd64.deb on client side).
>>>     They are all Debian 8 64bit. Servers are test machines that
>>>     serve only one volume to this sole client. Volume is a simple x2
>>>     replica. I just changed for test network.inode-lru-limit value
>>>     to 1024. Mount point /root/MNT is only used for these tests.
>>>
>>>     --
>>>     Y.
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> Pranith
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160829/be7afd26/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160829/be7afd26/attachment.p7s>