[Gluster-users] Memory leak in GlusterFS FUSE client
Soumya Koduri
skoduri at redhat.com
Wed Jan 6 06:40:05 UTC 2016
On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:
> OK, I've repeated the same traversing test with patched GlusterFS API, and
> here is new Valgrind log:
>
> https://gist.github.com/17ecb16a11c9aed957f5
>
Fuse mount doesn't use gfapi helper. Does your above GlusterFS API
application call glfs_fini() during exit? glfs_fini() is responsible for
freeing the memory consumed by gfAPI applications.
Could you repeat the test with nfs-ganesha (which for sure calls
glfs_fini() and purges inodes if exceeds its inode cache limit) if possible.
Thanks,
Soumya
> Still leaks.
>
> On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:
>> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:
>>> Unfortunately, both patches didn't make any difference for me.
>>>
>>> I've patched 3.7.6 with both patches, recompiled and installed patched
>>> GlusterFS package on client side and mounted volume with ~2M of files.
>>> The I performed usual tree traverse with simple "find".
>>>
>>> Memory RES value went from ~130M at the moment of mounting to ~1.5G
>>> after traversing the volume for ~40 mins. Valgrind log still shows lots
>>> of leaks. Here it is:
>>>
>>> https://gist.github.com/56906ca6e657c4ffa4a1
>>
>> Looks like you had done fuse mount. The patches which I have pasted
>> below apply to gfapi/nfs-ganesha applications.
>>
>> Also, to resolve the nfs-ganesha issue which I had mentioned below (in
>> case if Entries_HWMARK option gets changed), I have posted below fix -
>> https://review.gerrithub.io/#/c/258687
>>
>> Thanks,
>> Soumya
>>
>>> Ideas?
>>>
>>> 05.01.2016 12:31, Soumya Koduri написав:
>>>> I tried to debug the inode* related leaks and seen some improvements
>>>> after applying the below patches when ran the same test (but will
>>>> smaller load). Could you please apply those patches & confirm the
>>>> same?
>>>>
>>>> a) http://review.gluster.org/13125
>>>>
>>>> This will fix the inodes & their ctx related leaks during unexport and
>>>> the program exit. Please check the valgrind output after applying the
>>>> patch. It should not list any inodes related memory as lost.
>>>>
>>>> b) http://review.gluster.org/13096
>>>>
>>>> The reason the change in Entries_HWMARK (in your earlier mail) dint
>>>> have much effect is that the inode_nlookup count doesn't become zero
>>>> for those handles/inodes being closed by ganesha. Hence those inodes
>>>> shall get added to inode lru list instead of purge list which shall
>>>> get forcefully purged only when the number of gfapi inode table
>>>> entries reaches its limit (which is 137012).
>>>>
>>>> This patch fixes those 'nlookup' counts. Please apply this patch and
>>>> reduce 'Entries_HWMARK' to much lower value and check if it decreases
>>>> the in-memory being consumed by ganesha process while being active.
>>>>
>>>> CACHEINODE {
>>>>
>>>> Entries_HWMark = 500;
>>>>
>>>> }
>>>>
>>>>
>>>> Note: I see an issue with nfs-ganesha during exit when the option
>>>> 'Entries_HWMARK' gets changed. This is not related to any of the above
>>>> patches (or rather Gluster) and I am currently debugging it.
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:
>>>>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096
>>>>>
>>>>> Before find . -type f:
>>>>>
>>>>> root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00
>>>>> /usr/bin/
>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>>>>> NIV_EVENT
>>>>>
>>>>> After:
>>>>>
>>>>> root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39
>>>>> /usr/bin/
>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>>>>> NIV_EVENT
>>>>>
>>>>> ~250M leak.
>>>>>
>>>>> 2. test with default values (after ganesha restart)
>>>>>
>>>>> Before:
>>>>>
>>>>> root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00
>>>>> /usr/bin/
>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>>>>> NIV_EVENT
>>>>>
>>>>> After:
>>>>>
>>>>> root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40
>>>>> /usr/bin/
>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>>>>> NIV_EVENT
>>>>>
>>>>> ~159M leak.
>>>>>
>>>>> No reasonable correlation detected. Second test was finished much
>>>>> faster than
>>>>> first (I guess, server-side GlusterFS cache or server kernel page
>>>>> cache is the
>>>>> cause).
>>>>>
>>>>> There are ~1.8M files on this test volume.
>>>>>
>>>>> On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
>>>>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
>>>>>>> Another addition: it seems to be GlusterFS API library memory leak
>>>>>>> because NFS-Ganesha also consumes huge amount of memory while doing
>>>>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
>>>>>>> usage:
>>>>>>>
>>>>>>> ===
>>>>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54
>>>>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
>>>>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT
>>>>>>> ===
>>>>>>>
>>>>>>> 1.4G is too much for simple stat() :(.
>>>>>>>
>>>>>>> Ideas?
>>>>>>
>>>>>> nfs-ganesha also has cache layer which can scale to millions of entries
>>>>>> depending on the number of files/directories being looked upon. However
>>>>>> there are parameters to tune it. So either try stat with few entries or
>>>>>> add below block in nfs-ganesha.conf file, set low limits and check the
>>>>>> difference. That may help us narrow down how much memory actually
>>>>>> consumed by core nfs-ganesha and gfAPI.
>>>>>>
>>>>>> CACHEINODE {
>>>>>>
>>>>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); #
>>>>>>
>>>>>> cache size
>>>>>>
>>>>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000);
>>>>>>
>>>>>> #Max no.
>>>>>> of entries in the cache.
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Soumya
>>>>>>
>>>>>>> 24.12.2015 16:32, Oleksandr Natalenko написав:
>>>>>>>> Still actual issue for 3.7.6. Any suggestions?
>>>>>>>>
>>>>>>>> 24.09.2015 10:14, Oleksandr Natalenko написав:
>>>>>>>>> In our GlusterFS deployment we've encountered something like memory
>>>>>>>>> leak in GlusterFS FUSE client.
>>>>>>>>>
>>>>>>>>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
>>>>>>>>> maildir format). Here is inode stats for both bricks and mountpoint:
>>>>>>>>>
>>>>>>>>> ===
>>>>>>>>> Brick 1 (Server 1):
>>>>>>>>>
>>>>>>>>> Filesystem Inodes IUsed
>>>>>>>>>
>>>>>>>>> IFree IUse% Mounted on
>>>>>>>>>
>>>>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144
>>>>>>>>> 10954918
>>>>>>>>>
>>>>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail
>>>>>>>>>
>>>>>>>>> Brick 2 (Server 2):
>>>>>>>>>
>>>>>>>>> Filesystem Inodes IUsed
>>>>>>>>>
>>>>>>>>> IFree IUse% Mounted on
>>>>>>>>>
>>>>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984
>>>>>>>>> 10954913
>>>>>>>>>
>>>>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail
>>>>>>>>>
>>>>>>>>> Mountpoint (Server 3):
>>>>>>>>>
>>>>>>>>> Filesystem Inodes IUsed IFree
>>>>>>>>> IUse% Mounted on
>>>>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845
>>>>>>>>> 2% /var/spool/mail/virtual
>>>>>>>>> ===
>>>>>>>>>
>>>>>>>>> glusterfs.xxx domain has two A records for both Server 1 and
>>>>>>>>> Server 2.
>>>>>>>>>
>>>>>>>>> Here is volume info:
>>>>>>>>>
>>>>>>>>> ===
>>>>>>>>> Volume Name: mail
>>>>>>>>> Type: Replicate
>>>>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>>>>> Options Reconfigured:
>>>>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
>>>>>>>>> features.cache-invalidation-timeout: 10
>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>> performance.quick-read: on
>>>>>>>>> performance.read-ahead: off
>>>>>>>>> performance.flush-behind: on
>>>>>>>>> performance.write-behind: on
>>>>>>>>> performance.io-thread-count: 4
>>>>>>>>> performance.cache-max-file-size: 1048576
>>>>>>>>> performance.cache-size: 67108864
>>>>>>>>> performance.readdir-ahead: off
>>>>>>>>> ===
>>>>>>>>>
>>>>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client
>>>>>>>>> process begins to consume huge amount of RAM:
>>>>>>>>>
>>>>>>>>> ===
>>>>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail
>>>>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05
>>>>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
>>>>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail
>>>>>>>>> /var/spool/mail/virtual
>>>>>>>>> ===
>>>>>>>>>
>>>>>>>>> That is, ~15 GiB of RAM.
>>>>>>>>>
>>>>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2
>>>>>>>>> or 3
>>>>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for
>>>>>>>>> glusterfs client process.
>>>>>>>>>
>>>>>>>>> Mounting same share via NFS works just fine. Also, we have much less
>>>>>>>>> iowait and loadavg on client side with NFS.
>>>>>>>>>
>>>>>>>>> Also, we've tried to change IO threads count and cache size in order
>>>>>>>>> to limit memory usage with no luck. As you can see, total cache size
>>>>>>>>> is 4×64==256 MiB (compare to 15 GiB).
>>>>>>>>>
>>>>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead
>>>>>>>>> didn't
>>>>>>>>> help as well.
>>>>>>>>>
>>>>>>>>> Here are volume memory stats:
>>>>>>>>>
>>>>>>>>> ===
>>>>>>>>> Memory status for volume : mail
>>>>>>>>> ----------------------------------------------
>>>>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>>>>> Mallinfo
>>>>>>>>> --------
>>>>>>>>> Arena : 36859904
>>>>>>>>> Ordblks : 10357
>>>>>>>>> Smblks : 519
>>>>>>>>> Hblks : 21
>>>>>>>>> Hblkhd : 30515200
>>>>>>>>> Usmblks : 0
>>>>>>>>> Fsmblks : 53440
>>>>>>>>> Uordblks : 18604144
>>>>>>>>> Fordblks : 18255760
>>>>>>>>> Keepcost : 114112
>>>>>>>>>
>>>>>>>>> Mempool Stats
>>>>>>>>> -------------
>>>>>>>>> Name HotCount ColdCount PaddedSizeof
>>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
>>>>>>>>> ---- -------- --------- ------------
>>>>>>>>> ---------- -------- -------- ------------
>>>>>>>>> mail-server:fd_t 0 1024 108
>>>>>>>>> 30773120 137 0 0
>>>>>>>>> mail-server:dentry_t 16110 274 84
>>>>>>>>> 235676148 16384 1106499 1152
>>>>>>>>> mail-server:inode_t 16363 21 156
>>>>>>>>> 237216876 16384 1876651 1169
>>>>>>>>> mail-trash:fd_t 0 1024 108
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-trash:dentry_t 0 32768 84
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-trash:inode_t 4 32764 156
>>>>>>>>>
>>>>>>>>> 4 4 0 0
>>>>>>>>>
>>>>>>>>> mail-trash:trash_local_t 0 64 8628
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
>>>>>>>>> 16540 0 0 0 0
>>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-changelog:changelog_local_t 0 64 116
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
>>>>>>>>> 79204 4 0 0
>>>>>>>>> mail-locks:pl_local_t 0 32 148
>>>>>>>>> 6812757 4 0 0
>>>>>>>>> mail-upcall:upcall_local_t 0 512 108
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-marker:marker_local_t 0 128 332
>>>>>>>>> 64980 3 0 0
>>>>>>>>> mail-quota:quota_local_t 0 64 476
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-server:rpcsvc_request_t 0 512 2828
>>>>>>>>> 45462533 34 0 0
>>>>>>>>> glusterfs:struct saved_frame 0 8 124
>>>>>>>>>
>>>>>>>>> 2 2 0 0
>>>>>>>>>
>>>>>>>>> glusterfs:struct rpc_req 0 8 588
>>>>>>>>>
>>>>>>>>> 2 2 0 0
>>>>>>>>>
>>>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
>>>>>>>>>
>>>>>>>>> 2 1 0 0
>>>>>>>>>
>>>>>>>>> glusterfs:log_buf_t 5 251 140
>>>>>>>>> 3452 6 0 0
>>>>>>>>> glusterfs:data_t 242 16141 52
>>>>>>>>> 480115498 664 0 0
>>>>>>>>> glusterfs:data_pair_t 230 16153 68
>>>>>>>>> 179483528 275 0 0
>>>>>>>>> glusterfs:dict_t 23 4073 140
>>>>>>>>> 303751675 627 0 0
>>>>>>>>> glusterfs:call_stub_t 0 1024 3764
>>>>>>>>> 45290655 34 0 0
>>>>>>>>> glusterfs:call_stack_t 1 1023 1708
>>>>>>>>> 43598469 34 0 0
>>>>>>>>> glusterfs:call_frame_t 1 4095 172
>>>>>>>>> 336219655 184 0 0
>>>>>>>>> ----------------------------------------------
>>>>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>>>>> Mallinfo
>>>>>>>>> --------
>>>>>>>>> Arena : 38174720
>>>>>>>>> Ordblks : 9041
>>>>>>>>> Smblks : 507
>>>>>>>>> Hblks : 21
>>>>>>>>> Hblkhd : 30515200
>>>>>>>>> Usmblks : 0
>>>>>>>>> Fsmblks : 51712
>>>>>>>>> Uordblks : 19415008
>>>>>>>>> Fordblks : 18759712
>>>>>>>>> Keepcost : 114848
>>>>>>>>>
>>>>>>>>> Mempool Stats
>>>>>>>>> -------------
>>>>>>>>> Name HotCount ColdCount PaddedSizeof
>>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
>>>>>>>>> ---- -------- --------- ------------
>>>>>>>>> ---------- -------- -------- ------------
>>>>>>>>> mail-server:fd_t 0 1024 108
>>>>>>>>> 2373075 133 0 0
>>>>>>>>> mail-server:dentry_t 14114 2270 84
>>>>>>>>> 3513654 16384 2300 267
>>>>>>>>> mail-server:inode_t 16374 10 156
>>>>>>>>> 6766642 16384 194635 1279
>>>>>>>>> mail-trash:fd_t 0 1024 108
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-trash:dentry_t 0 32768 84
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-trash:inode_t 4 32764 156
>>>>>>>>>
>>>>>>>>> 4 4 0 0
>>>>>>>>>
>>>>>>>>> mail-trash:trash_local_t 0 64 8628
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
>>>>>>>>> 16540 0 0 0 0
>>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-changelog:changelog_local_t 0 64 116
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
>>>>>>>>> 71354 4 0 0
>>>>>>>>> mail-locks:pl_local_t 0 32 148
>>>>>>>>> 8135032 4 0 0
>>>>>>>>> mail-upcall:upcall_local_t 0 512 108
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-marker:marker_local_t 0 128 332
>>>>>>>>> 65005 3 0 0
>>>>>>>>> mail-quota:quota_local_t 0 64 476
>>>>>>>>>
>>>>>>>>> 0 0 0 0
>>>>>>>>>
>>>>>>>>> mail-server:rpcsvc_request_t 0 512 2828
>>>>>>>>> 12882393 30 0 0
>>>>>>>>> glusterfs:struct saved_frame 0 8 124
>>>>>>>>>
>>>>>>>>> 2 2 0 0
>>>>>>>>>
>>>>>>>>> glusterfs:struct rpc_req 0 8 588
>>>>>>>>>
>>>>>>>>> 2 2 0 0
>>>>>>>>>
>>>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
>>>>>>>>>
>>>>>>>>> 2 1 0 0
>>>>>>>>>
>>>>>>>>> glusterfs:log_buf_t 5 251 140
>>>>>>>>> 3443 6 0 0
>>>>>>>>> glusterfs:data_t 242 16141 52
>>>>>>>>> 138743429 290 0 0
>>>>>>>>> glusterfs:data_pair_t 230 16153 68
>>>>>>>>> 126649864 270 0 0
>>>>>>>>> glusterfs:dict_t 23 4073 140
>>>>>>>>> 20356289 63 0 0
>>>>>>>>> glusterfs:call_stub_t 0 1024 3764
>>>>>>>>> 13678560 31 0 0
>>>>>>>>> glusterfs:call_stack_t 1 1023 1708
>>>>>>>>> 11011561 30 0 0
>>>>>>>>> glusterfs:call_frame_t 1 4095 172
>>>>>>>>> 125764190 193 0 0
>>>>>>>>> ----------------------------------------------
>>>>>>>>> ===
>>>>>>>>>
>>>>>>>>> So, my questions are:
>>>>>>>>>
>>>>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage?
>>>>>>>>> 2) what one should do to prevent client high loadavg because of high
>>>>>>>>> iowait because of multiple concurrent volume users?
>>>>>>>>>
>>>>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3,
>>>>>>>>> GlusterFS client version is 3.7.4.
>>>>>>>>>
>>>>>>>>> Any additional info needed?
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
More information about the Gluster-users
mailing list