[Gluster-users] [Gluster-devel] GlusterFS FUSE client leaks summary — part I
Oleksandr Natalenko
oleksandr at natalenko.name
Tue Feb 2 12:53:53 UTC 2016
02.02.2016 10:07, Xavier Hernandez написав:
> Could it be memory used by Valgrind itself to track glusterfs' memory
> usage ?
>
> Could you repeat the test without Valgrind and see if the memory usage
> after dropping caches returns to low values ?
Yup. Here are the results:
===
pf at server:~ » ps aux | grep volume
root 19412 14.4 10.0 5416964 4971692 ? Ssl 10:15 36:32
/usr/sbin/glusterfs --volfile-server=server.example.com
--volfile-id=volume /mnt/volume
pf at server:~ » echo 2 | sudo tee /proc/sys/vm/drop_caches
2
pf at server:~ » ps aux | grep volume
root 19412 13.6 3.5 2336772 1740804 ? Ssl 10:15 36:53
/usr/sbin/glusterfs --volfile-server=server.example.com
--volfile-id=volume /mnt/volume
===
Dropped from 4.9G to 1.7G. But fresh mount consumes only 25M
(megabytes):
===
root 23347 0.7 0.0 698376 25124 ? Ssl 14:49 0:00
/usr/sbin/glusterfs --volfile-server=server.example.com
--volfile-id=volume /mnt/volume
===
Why?
>
>> Examining statedump shows only the following snippet
>> with high "size" value:
>>
>> ===
>> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
>> size=4234592647
>> num_allocs=1
>> max_size=4294935223
>> max_num_allocs=3
>> total_allocs=4186991
>> ===
>>
>> Another leak?
>>
>> Grepping "gf_fuse_mt_iov_base" on GlusterFS source tree shows the
>> following:
>>
>> ===
>> $ grep -Rn gf_fuse_mt_iov_base
>> xlators/mount/fuse/src/fuse-mem-types.h:20:
>> gf_fuse_mt_iov_base,
>> xlators/mount/fuse/src/fuse-bridge.c:4887:
>> gf_fuse_mt_iov_base);
>> ===
>>
>> fuse-bridge.c snippet:
>>
>> ===
>> /* Add extra 128 byte to the first iov so that it can
>> * accommodate "ordinary" non-write requests. It's
>> not
>> * guaranteed to be big enough, as SETXATTR and
>> namespace
>> * operations with very long names may grow behind
>> it,
>> * but it's good enough in most cases (and we can
>> handle
>> * rest via realloc).
>> */
>> iov_in[0].iov_base = GF_CALLOC (1, msg0_size,
>> gf_fuse_mt_iov_base);
>> ===
>>
>> Probably, some freeing missing for iov_base?
>
> This is not a real memory leak. It's only a bad accounting of memory.
> Note that num_allocs is 1. If you look at libglusterfs/src/mem-pool.c,
> you will see this:
>
>
> /* TBD: it would be nice to adjust the memory accounting info here,
> * but calling gf_mem_set_acct_info here is wrong because it bumps
> * up counts as though this is a new allocation - which it's not.
> * The consequence of doing nothing here is only that the sizes will be
> * wrong, but at least the counts won't be.
> uint32_t type = 0;
> xlator_t *xl = NULL;
> type = header->type;
> xl = (xlator_t *) header->xlator;
> gf_mem_set_acct_info (xl, &new_ptr, size, type, NULL);
> */
>
> This means that memory reallocs are not correctly accounted, so the
> tracked size is incorrect (note that fuse_thread_proc() calls
> GF_REALLOC() in some cases).
>
> There are two problems here:
>
> 1. The memory is allocated with a given size S1, then reallocated with
> a size S2 (S2 > S1), but not accounted, so the memory accounting
> system still thinks that the allocated size is S1. When memory is
> freed, S2 is substracted from the total size used. With enough
> allocs/reallocs/frees, this value becomes negative.
>
> 2. statedump shows the 64-bit 'size' field representing the total
> memory used by a given type as an unsigned 32-bit value, loosing some
> information.
>
> Xavi
>
>>
>> [1] https://gist.github.com/f0cf98e8bff0c13ea38f
>> [2] https://gist.github.com/87baa0a778ba54f0f7f7
>> [3] https://gist.github.com/7013b493d19c8c5fffae
>> [4] https://gist.github.com/cc38155b57e68d7e86d5
>> [5] https://gist.github.com/6a24000c77760a97976a
>> [6] https://gist.github.com/74bd7a9f734c2fd21c33
>>
>> On понеділок, 1 лютого 2016 р. 14:24:22 EET Soumya Koduri wrote:
>>> On 02/01/2016 01:39 PM, Oleksandr Natalenko wrote:
>>>> Wait. It seems to be my bad.
>>>>
>>>> Before unmounting I do drop_caches (2), and glusterfs process CPU
>>>> usage
>>>> goes to 100% for a while. I haven't waited for it to drop to 0%, and
>>>> instead perform unmount. It seems glusterfs is purging inodes and
>>>> that's
>>>> why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to
>>>> become normal, and got no leaks.
>>>>
>>>> Will verify this once again and report more.
>>>>
>>>> BTW, if that works, how could I limit inode cache for FUSE client? I
>>>> do
>>>> not want it to go beyond 1G, for example, even if I have 48G of RAM
>>>> on
>>>> my server.
>>>
>>> Its hard-coded for now. For fuse the lru limit (of the inodes which
>>> are
>>> not active) is (32*1024).
>>> One of the ways to address this (which we were discussing earlier) is
>>> to
>>> have an option to configure inode cache limit. If that sounds good,
>>> we
>>> can then check on if it has to be global/volume-level,
>>> client/server/both.
>>>
>>> Thanks,
>>> Soumya
>>>
>>>> 01.02.2016 09:54, Soumya Koduri написав:
>>>>> On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:
>>>>>> Unfortunately, this patch doesn't help.
>>>>>>
>>>>>> RAM usage on "find" finish is ~9G.
>>>>>>
>>>>>> Here is statedump before drop_caches: https://gist.github.com/
>>>>>> fc1647de0982ab447e20
>>>>>
>>>>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>>>>> size=706766688
>>>>> num_allocs=2454051
>>>>>
>>>>>> And after drop_caches:
>>>>>> https://gist.github.com/5eab63bc13f78787ed19
>>>>>
>>>>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>>>>> size=550996416
>>>>> num_allocs=1913182
>>>>>
>>>>> There isn't much significant drop in inode contexts. One of the
>>>>> reasons could be because of dentrys holding a refcount on the
>>>>> inodes
>>>>> which shall result in inodes not getting purged even after
>>>>> fuse_forget.
>>>>>
>>>>>
>>>>> pool-name=fuse:dentry_t
>>>>> hot-count=32761
>>>>>
>>>>> if '32761' is the current active dentry count, it still doesn't
>>>>> seem
>>>>> to match up to inode count.
>>>>>
>>>>> Thanks,
>>>>> Soumya
>>>>>
>>>>>> And here is Valgrind output:
>>>>>> https://gist.github.com/2490aeac448320d98596
>>>>>>
>>>>>> On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:
>>>>>>> There's another inode leak caused by an incorrect counting of
>>>>>>> lookups on directory reads.
>>>>>>>
>>>>>>> Here's a patch that solves the problem for
>>>>>>> 3.7:
>>>>>>>
>>>>>>> http://review.gluster.org/13324
>>>>>>>
>>>>>>> Hopefully with this patch the
>>>>>>> memory leaks should disapear.
>>>>>>>
>>>>>>> Xavi
>>>>>>>
>>>>>>> On 29.01.2016 19:09, Oleksandr
>>>>>>>
>>>>>>> Natalenko wrote:
>>>>>>>> Here is intermediate summary of current memory
>>>>>>>
>>>>>>> leaks in FUSE client
>>>>>>>
>>>>>>>> investigation.
>>>>>>>>
>>>>>>>> I use GlusterFS v3.7.6
>>>>>>>
>>>>>>> release with the following patches:
>>>>>>>> ===
>>>>>>>
>>>>>>>> Kaleb S KEITHLEY (1):
>>>>>>> fuse: use-after-free fix in fuse-bridge, revisited
>>>>>>>
>>>>>>>> Pranith Kumar K
>>>>>>>
>>>>>>> (1):
>>>>>>>> mount/fuse: Fix use-after-free crash
>>>>>>>
>>>>>>>> Soumya Koduri (3):
>>>>>>> gfapi: Fix inode nlookup counts
>>>>>>>
>>>>>>>> inode: Retire the inodes from the lru
>>>>>>>
>>>>>>> list in inode_table_destroy
>>>>>>>
>>>>>>>> upcall: free the xdr* allocations
>>>>>>>> ===
>>>>>>>>
>>>>>>>>
>>>>>>>> With those patches we got API leaks fixed (I hope, brief tests
>>>>>>>> show
>>>>>>>
>>>>>>> that) and
>>>>>>>
>>>>>>>> got rid of "kernel notifier loop terminated" message.
>>>>>>>
>>>>>>> Nevertheless, FUSE
>>>>>>>
>>>>>>>> client still leaks.
>>>>>>>>
>>>>>>>> I have several test
>>>>>>>
>>>>>>> volumes with several million of small files (100K…2M in
>>>>>>>
>>>>>>>> average). I
>>>>>>>
>>>>>>> do 2 types of FUSE client testing:
>>>>>>>> 1) find /mnt/volume -type d
>>>>>>>> 2)
>>>>>>>
>>>>>>> rsync -av -H /mnt/source_volume/* /mnt/target_volume/
>>>>>>>
>>>>>>>> And most
>>>>>>>
>>>>>>> up-to-date results are shown below:
>>>>>>>> === find /mnt/volume -type d
>>>>>>>
>>>>>>> ===
>>>>>>>
>>>>>>>> Memory consumption: ~4G
>>>>>>>
>>>>>>>> Statedump:
>>>>>>> https://gist.github.com/10cde83c63f1b4f1dd7a
>>>>>>>
>>>>>>>> Valgrind:
>>>>>>> https://gist.github.com/097afb01ebb2c5e9e78d
>>>>>>>
>>>>>>>> I guess,
>>>>>>>
>>>>>>> fuse-bridge/fuse-resolve. related.
>>>>>>>
>>>>>>>> === rsync -av -H
>>>>>>>
>>>>>>> /mnt/source_volume/* /mnt/target_volume/ ===
>>>>>>>
>>>>>>>> Memory consumption:
>>>>>>> ~3.3...4G
>>>>>>>
>>>>>>>> Statedump (target volume):
>>>>>>> https://gist.github.com/31e43110eaa4da663435
>>>>>>>
>>>>>>>> Valgrind (target volume):
>>>>>>> https://gist.github.com/f8e0151a6878cacc9b1a
>>>>>>>
>>>>>>>> I guess,
>>>>>>>
>>>>>>> DHT-related.
>>>>>>>
>>>>>>>> Give me more patches to test :).
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>>>> Gluster-devel mailing
>>>>>>>
>>>>>>> list
>>>>>>>
>>>>>>>> Gluster-devel at gluster.org
>>>>>>>
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
More information about the Gluster-users
mailing list