[Gluster-users] [Gluster-devel] GlusterFS FUSE client leaks summary — part I

Tue Feb 2 12:53:53 UTC 2016

02.02.2016 10:07, Xavier Hernandez написав:
> Could it be memory used by Valgrind itself to track glusterfs' memory 
> usage ?
> 
> Could you repeat the test without Valgrind and see if the memory usage
> after dropping caches returns to low values ?

Yup. Here are the results:

===
pf at server:~ » ps aux | grep volume
root     19412 14.4 10.0 5416964 4971692 ?     Ssl  10:15  36:32 
/usr/sbin/glusterfs --volfile-server=server.example.com 
--volfile-id=volume /mnt/volume

pf at server:~ » echo 2 | sudo tee /proc/sys/vm/drop_caches
2

pf at server:~ » ps aux | grep volume
root     19412 13.6  3.5 2336772 1740804 ?     Ssl  10:15  36:53 
/usr/sbin/glusterfs --volfile-server=server.example.com 
--volfile-id=volume /mnt/volume
===

Dropped from 4.9G to 1.7G. But fresh mount consumes only 25M 
(megabytes):

===
root     23347  0.7  0.0 698376 25124 ?        Ssl  14:49   0:00 
/usr/sbin/glusterfs --volfile-server=server.example.com 
--volfile-id=volume /mnt/volume
===

Why?

> 
>> Examining statedump shows only the following snippet
>> with high "size" value:
>> 
>> ===
>> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
>> size=4234592647
>> num_allocs=1
>> max_size=4294935223
>> max_num_allocs=3
>> total_allocs=4186991
>> ===
>> 
>> Another leak?
>> 
>> Grepping "gf_fuse_mt_iov_base" on GlusterFS source tree shows the 
>> following:
>> 
>> ===
>> $ grep -Rn gf_fuse_mt_iov_base
>> xlators/mount/fuse/src/fuse-mem-types.h:20:        
>> gf_fuse_mt_iov_base,
>> xlators/mount/fuse/src/fuse-bridge.c:4887:
>> gf_fuse_mt_iov_base);
>> ===
>> 
>> fuse-bridge.c snippet:
>> 
>> ===
>>                  /* Add extra 128 byte to the first iov so that it can
>>                   * accommodate "ordinary" non-write requests. It's 
>> not
>>                   * guaranteed to be big enough, as SETXATTR and 
>> namespace
>>                   * operations with very long names may grow behind 
>> it,
>>                   * but it's good enough in most cases (and we can 
>> handle
>>                   * rest via realloc).
>>                   */
>>                  iov_in[0].iov_base = GF_CALLOC (1, msg0_size,
>>                                                  gf_fuse_mt_iov_base);
>> ===
>> 
>> Probably, some freeing missing for iov_base?
> 
> This is not a real memory leak. It's only a bad accounting of memory.
> Note that num_allocs is 1. If you look at libglusterfs/src/mem-pool.c,
> you will see this:
> 
> 
> /* TBD: it would be nice to adjust the memory accounting info here,
>  * but calling gf_mem_set_acct_info here is wrong because it bumps
>  * up counts as though this is a new allocation - which it's not.
>  * The consequence of doing nothing here is only that the sizes will be
>  * wrong, but at least the counts won't be.
> uint32_t           type = 0;
> xlator_t          *xl = NULL;
> type = header->type;
> xl = (xlator_t *) header->xlator;
> gf_mem_set_acct_info (xl, &new_ptr, size, type, NULL);
> */
> 
> This means that memory reallocs are not correctly accounted, so the
> tracked size is incorrect (note that fuse_thread_proc() calls
> GF_REALLOC() in some cases).
> 
> There are two problems here:
> 
> 1. The memory is allocated with a given size S1, then reallocated with
> a size S2 (S2 > S1), but not accounted, so the memory accounting
> system still thinks that the allocated size is S1. When memory is
> freed, S2 is substracted from the total size used. With enough
> allocs/reallocs/frees, this value becomes negative.
> 
> 2. statedump shows the 64-bit 'size' field representing the total
> memory used by a given type as an unsigned 32-bit value, loosing some
> information.
> 
> Xavi
> 
>> 
>> [1] https://gist.github.com/f0cf98e8bff0c13ea38f
>> [2] https://gist.github.com/87baa0a778ba54f0f7f7
>> [3] https://gist.github.com/7013b493d19c8c5fffae
>> [4] https://gist.github.com/cc38155b57e68d7e86d5
>> [5] https://gist.github.com/6a24000c77760a97976a
>> [6] https://gist.github.com/74bd7a9f734c2fd21c33
>> 
>> On понеділок, 1 лютого 2016 р. 14:24:22 EET Soumya Koduri wrote:
>>> On 02/01/2016 01:39 PM, Oleksandr Natalenko wrote:
>>>> Wait. It seems to be my bad.
>>>> 
>>>> Before unmounting I do drop_caches (2), and glusterfs process CPU 
>>>> usage
>>>> goes to 100% for a while. I haven't waited for it to drop to 0%, and
>>>> instead perform unmount. It seems glusterfs is purging inodes and 
>>>> that's
>>>> why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to
>>>> become normal, and got no leaks.
>>>> 
>>>> Will verify this once again and report more.
>>>> 
>>>> BTW, if that works, how could I limit inode cache for FUSE client? I 
>>>> do
>>>> not want it to go beyond 1G, for example, even if I have 48G of RAM 
>>>> on
>>>> my server.
>>> 
>>> Its hard-coded for now. For fuse the lru limit (of the inodes which 
>>> are
>>> not active) is (32*1024).
>>> One of the ways to address this (which we were discussing earlier) is 
>>> to
>>> have an option to configure inode cache limit. If that sounds good, 
>>> we
>>> can then check on if it has to be global/volume-level, 
>>> client/server/both.
>>> 
>>> Thanks,
>>> Soumya
>>> 
>>>> 01.02.2016 09:54, Soumya Koduri написав:
>>>>> On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:
>>>>>> Unfortunately, this patch doesn't help.
>>>>>> 
>>>>>> RAM usage on "find" finish is ~9G.
>>>>>> 
>>>>>> Here is statedump before drop_caches: https://gist.github.com/
>>>>>> fc1647de0982ab447e20
>>>>> 
>>>>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>>>>> size=706766688
>>>>> num_allocs=2454051
>>>>> 
>>>>>> And after drop_caches: 
>>>>>> https://gist.github.com/5eab63bc13f78787ed19
>>>>> 
>>>>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>>>>> size=550996416
>>>>> num_allocs=1913182
>>>>> 
>>>>> There isn't much significant drop in inode contexts. One of the
>>>>> reasons could be because of dentrys holding a refcount on the 
>>>>> inodes
>>>>> which shall result in inodes not getting purged even after
>>>>> fuse_forget.
>>>>> 
>>>>> 
>>>>> pool-name=fuse:dentry_t
>>>>> hot-count=32761
>>>>> 
>>>>> if  '32761' is the current active dentry count, it still doesn't 
>>>>> seem
>>>>> to match up to inode count.
>>>>> 
>>>>> Thanks,
>>>>> Soumya
>>>>> 
>>>>>> And here is Valgrind output:
>>>>>> https://gist.github.com/2490aeac448320d98596
>>>>>> 
>>>>>> On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:
>>>>>>> There's another inode leak caused by an incorrect counting of
>>>>>>> lookups on directory reads.
>>>>>>> 
>>>>>>> Here's a patch that solves the problem for
>>>>>>> 3.7:
>>>>>>> 
>>>>>>> http://review.gluster.org/13324
>>>>>>> 
>>>>>>> Hopefully with this patch the
>>>>>>> memory leaks should disapear.
>>>>>>> 
>>>>>>> Xavi
>>>>>>> 
>>>>>>> On 29.01.2016 19:09, Oleksandr
>>>>>>> 
>>>>>>> Natalenko wrote:
>>>>>>>> Here is intermediate summary of current memory
>>>>>>> 
>>>>>>> leaks in FUSE client
>>>>>>> 
>>>>>>>> investigation.
>>>>>>>> 
>>>>>>>> I use GlusterFS v3.7.6
>>>>>>> 
>>>>>>> release with the following patches:
>>>>>>>> ===
>>>>>>> 
>>>>>>>> Kaleb S KEITHLEY (1):
>>>>>>> fuse: use-after-free fix in fuse-bridge, revisited
>>>>>>> 
>>>>>>>> Pranith Kumar K
>>>>>>> 
>>>>>>> (1):
>>>>>>>> mount/fuse: Fix use-after-free crash
>>>>>>> 
>>>>>>>> Soumya Koduri (3):
>>>>>>> gfapi: Fix inode nlookup counts
>>>>>>> 
>>>>>>>> inode: Retire the inodes from the lru
>>>>>>> 
>>>>>>> list in inode_table_destroy
>>>>>>> 
>>>>>>>> upcall: free the xdr* allocations
>>>>>>>> ===
>>>>>>>> 
>>>>>>>> 
>>>>>>>> With those patches we got API leaks fixed (I hope, brief tests 
>>>>>>>> show
>>>>>>> 
>>>>>>> that) and
>>>>>>> 
>>>>>>>> got rid of "kernel notifier loop terminated" message.
>>>>>>> 
>>>>>>> Nevertheless, FUSE
>>>>>>> 
>>>>>>>> client still leaks.
>>>>>>>> 
>>>>>>>> I have several test
>>>>>>> 
>>>>>>> volumes with several million of small files (100K…2M in
>>>>>>> 
>>>>>>>> average). I
>>>>>>> 
>>>>>>> do 2 types of FUSE client testing:
>>>>>>>> 1) find /mnt/volume -type d
>>>>>>>> 2)
>>>>>>> 
>>>>>>> rsync -av -H /mnt/source_volume/* /mnt/target_volume/
>>>>>>> 
>>>>>>>> And most
>>>>>>> 
>>>>>>> up-to-date results are shown below:
>>>>>>>> === find /mnt/volume -type d
>>>>>>> 
>>>>>>> ===
>>>>>>> 
>>>>>>>> Memory consumption: ~4G
>>>>>>> 
>>>>>>>> Statedump:
>>>>>>> https://gist.github.com/10cde83c63f1b4f1dd7a
>>>>>>> 
>>>>>>>> Valgrind:
>>>>>>> https://gist.github.com/097afb01ebb2c5e9e78d
>>>>>>> 
>>>>>>>> I guess,
>>>>>>> 
>>>>>>> fuse-bridge/fuse-resolve. related.
>>>>>>> 
>>>>>>>> === rsync -av -H
>>>>>>> 
>>>>>>> /mnt/source_volume/* /mnt/target_volume/ ===
>>>>>>> 
>>>>>>>> Memory consumption:
>>>>>>> ~3.3...4G
>>>>>>> 
>>>>>>>> Statedump (target volume):
>>>>>>> https://gist.github.com/31e43110eaa4da663435
>>>>>>> 
>>>>>>>> Valgrind (target volume):
>>>>>>> https://gist.github.com/f8e0151a6878cacc9b1a
>>>>>>> 
>>>>>>>> I guess,
>>>>>>> 
>>>>>>> DHT-related.
>>>>>>> 
>>>>>>>> Give me more patches to test :).
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> 
>>>>>>>> Gluster-devel mailing
>>>>>>> 
>>>>>>> list
>>>>>>> 
>>>>>>>> Gluster-devel at gluster.org
>>>>>>> 
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>> 
>>