[Gluster-users] Fuse memleaks, all versions → OOM-killer
Yannick Perret
yannick.perret at liris.cnrs.fr
Tue Aug 2 17:15:31 UTC 2016
In order to prevent too many swap usage I removed swap on this machine
(swapoff -a).
Memory usage was still growing.
After that I started an other program that takes memory (in order to
accelerate things) and I got the OOM-killer.
Here is the syslog:
[1246854.291996] Out of memory: Kill process 931 (glusterfs) score 742
or sacrifice child
[1246854.292102] Killed process 931 (glusterfs) total-vm:3527624kB,
anon-rss:3100328kB, file-rss:0kB
Last VSZ/RSS was: 3527624 / 3097096
Here is the rest of the OOM-killer data:
[1246854.291847] active_anon:600785 inactive_anon:377188 isolated_anon:0
active_file:97 inactive_file:137 isolated_file:0
unevictable:0 dirty:0 writeback:1 unstable:0
free:21740 slab_reclaimable:3309 slab_unreclaimable:3728
mapped:255 shmem:4267 pagetables:3286 bounce:0
free_cma:0
[1246854.291851] Node 0 DMA free:15876kB min:264kB low:328kB high:396kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB
managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
[1246854.291858] lowmem_reserve[]: 0 2980 3948 3948
[1246854.291861] Node 0 DMA32 free:54616kB min:50828kB low:63532kB
high:76240kB active_anon:1940432kB inactive_anon:1020924kB
active_file:248kB inactive_file:260kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:3129280kB managed:3054836kB mlocked:0kB
dirty:0kB writeback:0kB mapped:760kB shmem:14616kB
slab_reclaimable:9660kB slab_unreclaimable:8244kB kernel_stack:1456kB
pagetables:10056kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:803 all_unreclaimable? yes
[1246854.291865] lowmem_reserve[]: 0 0 967 967
[1246854.291867] Node 0 Normal free:16468kB min:16488kB low:20608kB
high:24732kB active_anon:462708kB inactive_anon:487828kB
active_file:140kB inactive_file:288kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1048576kB managed:990356kB mlocked:0kB
dirty:0kB writeback:4kB mapped:260kB shmem:2452kB
slab_reclaimable:3576kB slab_unreclaimable:6668kB kernel_stack:560kB
pagetables:3088kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:975 all_unreclaimable? yes
[1246854.291872] lowmem_reserve[]: 0 0 0 0
[1246854.291874] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 2*32kB (U) 3*64kB
(U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (EM)
= 15876kB
[1246854.291882] Node 0 DMA32: 1218*4kB (UEM) 848*8kB (UE) 621*16kB (UE)
314*32kB (UEM) 189*64kB (UEM) 49*128kB (UEM) 2*256kB (E) 0*512kB
0*1024kB 0*2048kB 1*4096kB (R) = 54616kB
[1246854.291891] Node 0 Normal: 3117*4kB (UE) 0*8kB 0*16kB 3*32kB (R)
1*64kB (R) 2*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R)
0*4096kB = 16468kB
[1246854.291900] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[1246854.291902] 4533 total pagecache pages
[1246854.291903] 0 pages in swap cache
[1246854.291905] Swap cache stats: add 343501, delete 343501, find
7730690/7732743
[1246854.291906] Free swap = 0kB
[1246854.291907] Total swap = 0kB
[1246854.291908] 1048462 pages RAM
[1246854.291909] 0 pages HighMem/MovableOnly
[1246854.291909] 14555 pages reserved
[1246854.291910] 0 pages hwpoisoned
Regards,
--
Y.
Le 02/08/2016 à 17:00, Yannick Perret a écrit :
> So here are the dumps, gzip'ed.
>
> What I did:
> 1. mounting the volume, removing all its content, umounting it
> 2. mounting the volume
> 3. performing a cp -Rp /usr/* /root/MNT
> 4. performing a rm -rf /root/MNT/*
> 5. taking a dump (glusterdump.p1.dump)
> 6. re-doing 3, 4 and 5 (glusterdump.p2.dump)
>
> VSZ/RSS are respectively:
> - 381896 / 35688 just after mount
> - 644040 / 309240 after 1st cp -Rp
> - 644040 / 310128 after 1st rm -rf
> - 709576 / 310128 after 1st kill -USR1
> - 840648 / 421964 after 2nd cp -Rp
> - 840648 / 422224 after 2nd rm -rf
>
> I created a small script that performs these actions in an infinite loop:
> while /bin/true
> do
> cp -Rp /usr/* /root/MNT/
> + get VSZ/RSS of glusterfs process
> rm -rf /root/MNT/*
> + get VSZ/RSS of glusterfs process
> done
>
> At this time here are the values so far:
> 971720 533988
> 1037256 645500
> 1037256 645840
> 1168328 757348
> 1168328 757620
> 1299400 869128
> 1299400 869328
> 1364936 980712
> 1364936 980944
> 1496008 1092384
> 1496008 1092404
> 1627080 1203796
> 1627080 1203996
> 1692616 1315572
> 1692616 1315504
> 1823688 1426812
> 1823688 1427340
> 1954760 1538716
> 1954760 1538772
> 2085832 1647676
> 2085832 1647708
> 2151368 1750392
> 2151368 1750708
> 2282440 1853864
> 2282440 1853764
> 2413512 1952668
> 2413512 1952704
> 2479048 2056500
> 2479048 2056712
>
> So at this time glusterfs process takes not far from 2Gb of resident
> memory, only performing exactly the same actions 'cp -Rp /usr/*
> /root/MNT' + 'rm -rf /root/MNT/*'.
>
> Swap usage is starting to increase a little, and I don't saw any
> memory dropping at this time.
> I can understand that kernel may not release the removed files (after
> rm -rf) immediatly, but the fist 'rm' occured at ~12:00 today and it
> is ~17:00 here so I can't understand why so much memory is used.
> I would expect the memory to grow during 'cp -Rp', then reduce after
> 'rm', but it stays the same. Even if it stays the same I would expect
> it to not grow more while cp-ing again.
>
> I let the cp/rm loop running to see what will happen. Feel free to ask
> for other data if it may help.
>
> Please note that I'll be in hollidays at the end of this week for 3
> weeks so I will mostly not be able to perform tests during this time
> (network connection is too bad where I go).
>
> Regards,
> --
> Y.
>
> Le 02/08/2016 à 05:11, Pranith Kumar Karampuri a écrit :
>>
>>
>> On Mon, Aug 1, 2016 at 3:40 PM, Yannick Perret
>> <yannick.perret at liris.cnrs.fr <mailto:yannick.perret at liris.cnrs.fr>>
>> wrote:
>>
>> Le 29/07/2016 à 18:39, Pranith Kumar Karampuri a écrit :
>>>
>>>
>>> On Fri, Jul 29, 2016 at 2:26 PM, Yannick Perret
>>> <yannick.perret at liris.cnrs.fr
>>> <mailto:yannick.perret at liris.cnrs.fr>> wrote:
>>>
>>> Ok, last try:
>>> after investigating more versions I found that FUSE client
>>> leaks memory on all of them.
>>> I tested:
>>> - 3.6.7 client on debian 7 32bit and on debian 8 64bit (with
>>> 3.6.7 serveurs on debian 8 64bit)
>>> - 3.6.9 client on debian 7 32bit and on debian 8 64bit (with
>>> 3.6.7 serveurs on debian 8 64bit)
>>> - 3.7.13 client on debian 8 64bit (with 3.8.1 serveurs on
>>> debian 8 64bit)
>>> - 3.8.1 client on debian 8 64bit (with 3.8.1 serveurs on
>>> debian 8 64bit)
>>> In all cases compiled from sources, appart for 3.8.1 where
>>> .deb were used (due to a configure runtime error).
>>> For 3.7 it was compiled with --disable-tiering. I also tried
>>> to compile with --disable-fusermount (no change).
>>>
>>> In all of these cases the memory (resident & virtual) of
>>> glusterfs process on client grows on each activity and never
>>> reach a max (and never reduce).
>>> "Activity" for these tests is cp -Rp and ls -lR.
>>> The client I let grows the most overreached ~4Go RAM. On
>>> smaller machines it ends by OOM killer killing glusterfs
>>> process or glusterfs dying due to allocation error.
>>>
>>> In 3.6 mem seems to grow continusly, whereas in 3.8.1 it
>>> grows by "steps" (430400 ko → 629144 (~1min) → 762324
>>> (~1min) → 827860…).
>>>
>>> All tests performed on a single test volume used only by my
>>> test client. Volume in a basic x2 replica. The only
>>> parameters I changed on this volume (without any effect) are
>>> diagnostics.client-log-level set to ERROR and
>>> network.inode-lru-limit set to 1024.
>>>
>>>
>>> Could you attach statedumps of your runs?
>>> The following link has steps to capture
>>> this(https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/
>>> ). We basically need to see what are the memory types that are
>>> increasing. If you could help find the issue, we can send the
>>> fixes for your workload. There is a 3.8.2 release in around 10
>>> days I think. We can probably target this issue for that?
>> Here are statedumps.
>> Steps:
>> 1. mount -t glusterfs ldap1.my.domain:SHARE /root/MNT/ (here VSZ
>> and RSS are 381896 35828)
>> 2. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>> glusterdump.n1.dump.1470042769)
>> 3. perform a 'ls -lR /root/MNT | wc -l' (btw result of wc -l is
>> 518396 :)) and a 'cp -Rp /usr/* /root/MNT/boo' (VSZ/RSS are
>> 1301536/711992 at end of these operations)
>> 4. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>> glusterdump.n2.dump.1470043929)
>> 5. do 'cp -Rp * /root/MNT/toto/', so on an other directory
>> (VSZ/RSS are 1432608/909968 at end of this operation)
>> 6. take a dump with kill -USR1 <pid-of-glusterfs-process> (file
>> glusterdump.n3.dump.)
>>
>>
>> Hey,
>> Thanks a lot for providing this information. Looking at these
>> steps, I don't see any problem for the increase in memory. Both ls
>> -lR and cp -Rp commands you did in the step-3 will add new inodes in
>> memory which increase the memory. What happens is as long as the
>> kernel thinks these inodes need to be in memory gluster keeps them in
>> memory. Once kernel doesn't think the inode is necessary, it sends
>> 'inode-forgets'. At this point the memory starts reducing. So it kind
>> of depends on the memory pressure kernel is under. But you said it
>> lead to OOM-killers on smaller machines which means there could be
>> some leaks. Could you modify the steps as follows to check to confirm
>> there are leaks? Please do this test on those smaller machines which
>> lead to OOM-killers.
>>
>> Steps:
>> 1. mount -t glusterfs ldap1.my.domain:SHARE /root/MNT/ (here VSZ and
>> RSS are 381896 35828)
>> 2. perform a 'ls -lR /root/MNT | wc -l' (btw result of wc -l is
>> 518396 :)) and a 'cp -Rp /usr/* /root/MNT/boo' (VSZ/RSS are
>> 1301536/711992 at end of these operations)
>> 3. do 'cp -Rp * /root/MNT/toto/', so on an other directory (VSZ/RSS
>> are 1432608/909968 at end of this operation)
>> 4. Delete all the files and directories you created in steps 2, 3 above
>> 5. Take statedump with kill -USR1 <pid-of-glusterfs-process>
>> 6. Repeat steps from 2-5
>>
>> Attach these two statedumps. I think the statedumps will be even more
>> affective if the mount does not have any data when you start the
>> experiment.
>>
>> HTH
>>
>>
>> Dump files are gzip'ed because they are very large.
>> Dump files are here (too big for email):
>> http://wikisend.com/download/623430/glusterdump.n1.dump.1470042769.gz
>> http://wikisend.com/download/771220/glusterdump.n2.dump.1470043929.gz
>> http://wikisend.com/download/428752/glusterdump.n3.dump.1470045181.gz
>> (I keep the files if someone whats them in an other format)
>>
>> Client and servers are installed from .deb files
>> (glusterfs-client_3.8.1-1_amd64.deb and
>> glusterfs-common_3.8.1-1_amd64.deb on client side).
>> They are all Debian 8 64bit. Servers are test machines that serve
>> only one volume to this sole client. Volume is a simple x2
>> replica. I just changed for test network.inode-lru-limit value to
>> 1024. Mount point /root/MNT is only used for these tests.
>>
>> --
>> Y.
>>
>>
>>
>>
>>
>> --
>> Pranith
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160802/42dfa986/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160802/42dfa986/attachment.p7s>
More information about the Gluster-users
mailing list