[Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client
Soumya Koduri
skoduri at redhat.com
Thu Dec 31 08:39:03 UTC 2015
On 12/28/2015 02:32 PM, Soumya Koduri wrote:
>
>
> ----- Original Message -----
>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> To: "Oleksandr Natalenko" <oleksandr at natalenko.name>, "Soumya Koduri" <skoduri at redhat.com>
>> Cc: gluster-users at gluster.org, gluster-devel at gluster.org
>> Sent: Monday, December 28, 2015 9:32:07 AM
>> Subject: Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client
>>
>>
>>
>> On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote:
>>> Also, here is valgrind output with our custom tool, that does GlusterFS
>>> volume
>>> traversing (with simple stats) just like find tool. In this case
>>> NFS-Ganesha
>>> is not used.
>>>
>>> https://gist.github.com/e4602a50d3c98f7a2766
>> hi Oleksandr,
>> I went through the code. Both NFS Ganesha and the custom tool use
>> gfapi and the leak is stemming from that. I am not very familiar with
>> this part of code but there seems to be one inode_unref() that is
>> missing in failure path of resolution. Not sure if that is corresponding
>> to the leaks.
>>
>> Soumya,
>> Could this be the issue? review.gluster.org seems to be down. So
>> couldn't send the patch. Please ping me on IRC.
>> diff --git a/api/src/glfs-resolve.c b/api/src/glfs-resolve.c
>> index b5efcba..52b538b 100644
>> --- a/api/src/glfs-resolve.c
>> +++ b/api/src/glfs-resolve.c
>> @@ -467,9 +467,11 @@ priv_glfs_resolve_at (struct glfs *fs, xlator_t
>> *subvol, inode_t *at,
>> }
>> }
>>
>> - if (parent && next_component)
>> + if (parent && next_component) {
>> + inode_unref (parent);
>> + parent = NULL;
>> /* resolution failed mid-way */
>> goto out;
>> + }
>>
>> /* At this point, all components up to the last parent directory
>> have been resolved successfully (@parent). Resolution of
>> basename
>>
> yes. This could be one of the reasons. There are few leaks with respect to inode references in gfAPI. See below.
>
>
> On GlusterFS side, looks like majority of the leaks are related to inodes and their contexts. Possible reasons which I can think of are:
>
> 1) When there is a graph switch, old inode table and their entries are not purged (this is a known issue). There was an effort put to fix this issue. But I think it had other side-effects and hence not been applied. Maybe we should revive those changes again.
>
> 2) With regard to above, old entries can be purged in case if any request comes with the reference to old inode (as part of 'glfs_resolve_inode'), provided their reference counts are properly decremented. But this is not happening at the moment in gfapi.
>
> 3) Applications should hold and release their reference as needed and required. There are certain fixes needed in this area as well (including the fix provided by Pranith above).
>
> From code-inspection, have made changes to fix few leaks of case (2) & (3) with respect to gfAPI.
> http://review.gluster.org/#/c/13096 (yet to test the changes)
>
> I haven't yet narrowed down any suspects pertaining to only NFS-Ganesha. Will re-check and update.
>
I tried similar tests but with smaller set of files. I could see the
inode_ctx leak even without graph switches involved. I suspect that
could be because valgrind checks for memory leaks during the exit of the
program. We call 'glfs_fini()' to cleanup the memory being used by
gfapi during exit. Those inode_ctx leaks are result of some inodes being
left during inode_table cleanup. I have submitted below patch to address
this issue.
http://review.gluster.org/13125
However this shall help only if there are volume un-exports being
involved or program being exited. It still doesn't address the actual
RAM being consumed by the application when active.
Thanks,
Soumya
> Thanks,
> Soumya
>
>
>> Pranith
>>>
>>> One may see GlusterFS-related leaks here as well.
>>>
>>> On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
>>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
>>>>> Another addition: it seems to be GlusterFS API library memory leak
>>>>> because NFS-Ganesha also consumes huge amount of memory while doing
>>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
>>>>> usage:
>>>>>
>>>>> ===
>>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54
>>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
>>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT
>>>>> ===
>>>>>
>>>>> 1.4G is too much for simple stat() :(.
>>>>>
>>>>> Ideas?
>>>> nfs-ganesha also has cache layer which can scale to millions of entries
>>>> depending on the number of files/directories being looked upon. However
>>>> there are parameters to tune it. So either try stat with few entries or
>>>> add below block in nfs-ganesha.conf file, set low limits and check the
>>>> difference. That may help us narrow down how much memory actually
>>>> consumed by core nfs-ganesha and gfAPI.
>>>>
>>>> CACHEINODE {
>>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
>>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no.
>>>> of entries in the cache.
>>>> }
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>> 24.12.2015 16:32, Oleksandr Natalenko написав:
>>>>>> Still actual issue for 3.7.6. Any suggestions?
>>>>>>
>>>>>> 24.09.2015 10:14, Oleksandr Natalenko написав:
>>>>>>> In our GlusterFS deployment we've encountered something like memory
>>>>>>> leak in GlusterFS FUSE client.
>>>>>>>
>>>>>>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
>>>>>>> maildir format). Here is inode stats for both bricks and mountpoint:
>>>>>>>
>>>>>>> ===
>>>>>>> Brick 1 (Server 1):
>>>>>>>
>>>>>>> Filesystem Inodes IUsed
>>>>>>>
>>>>>>> IFree IUse% Mounted on
>>>>>>>
>>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 10954918
>>>>>>>
>>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail
>>>>>>>
>>>>>>> Brick 2 (Server 2):
>>>>>>>
>>>>>>> Filesystem Inodes IUsed
>>>>>>>
>>>>>>> IFree IUse% Mounted on
>>>>>>>
>>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 10954913
>>>>>>>
>>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail
>>>>>>>
>>>>>>> Mountpoint (Server 3):
>>>>>>>
>>>>>>> Filesystem Inodes IUsed IFree
>>>>>>> IUse% Mounted on
>>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845
>>>>>>> 2% /var/spool/mail/virtual
>>>>>>> ===
>>>>>>>
>>>>>>> glusterfs.xxx domain has two A records for both Server 1 and Server 2.
>>>>>>>
>>>>>>> Here is volume info:
>>>>>>>
>>>>>>> ===
>>>>>>> Volume Name: mail
>>>>>>> Type: Replicate
>>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>>> Options Reconfigured:
>>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
>>>>>>> features.cache-invalidation-timeout: 10
>>>>>>> performance.stat-prefetch: off
>>>>>>> performance.quick-read: on
>>>>>>> performance.read-ahead: off
>>>>>>> performance.flush-behind: on
>>>>>>> performance.write-behind: on
>>>>>>> performance.io-thread-count: 4
>>>>>>> performance.cache-max-file-size: 1048576
>>>>>>> performance.cache-size: 67108864
>>>>>>> performance.readdir-ahead: off
>>>>>>> ===
>>>>>>>
>>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client
>>>>>>> process begins to consume huge amount of RAM:
>>>>>>>
>>>>>>> ===
>>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail
>>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05
>>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
>>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail
>>>>>>> /var/spool/mail/virtual
>>>>>>> ===
>>>>>>>
>>>>>>> That is, ~15 GiB of RAM.
>>>>>>>
>>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
>>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for
>>>>>>> glusterfs client process.
>>>>>>>
>>>>>>> Mounting same share via NFS works just fine. Also, we have much less
>>>>>>> iowait and loadavg on client side with NFS.
>>>>>>>
>>>>>>> Also, we've tried to change IO threads count and cache size in order
>>>>>>> to limit memory usage with no luck. As you can see, total cache size
>>>>>>> is 4×64==256 MiB (compare to 15 GiB).
>>>>>>>
>>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
>>>>>>> help as well.
>>>>>>>
>>>>>>> Here are volume memory stats:
>>>>>>>
>>>>>>> ===
>>>>>>> Memory status for volume : mail
>>>>>>> ----------------------------------------------
>>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>>> Mallinfo
>>>>>>> --------
>>>>>>> Arena : 36859904
>>>>>>> Ordblks : 10357
>>>>>>> Smblks : 519
>>>>>>> Hblks : 21
>>>>>>> Hblkhd : 30515200
>>>>>>> Usmblks : 0
>>>>>>> Fsmblks : 53440
>>>>>>> Uordblks : 18604144
>>>>>>> Fordblks : 18255760
>>>>>>> Keepcost : 114112
>>>>>>>
>>>>>>> Mempool Stats
>>>>>>> -------------
>>>>>>> Name HotCount ColdCount PaddedSizeof
>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
>>>>>>> ---- -------- --------- ------------
>>>>>>> ---------- -------- -------- ------------
>>>>>>> mail-server:fd_t 0 1024 108
>>>>>>> 30773120 137 0 0
>>>>>>> mail-server:dentry_t 16110 274 84
>>>>>>> 235676148 16384 1106499 1152
>>>>>>> mail-server:inode_t 16363 21 156
>>>>>>> 237216876 16384 1876651 1169
>>>>>>> mail-trash:fd_t 0 1024 108
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-trash:dentry_t 0 32768 84
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-trash:inode_t 4 32764 156
>>>>>>>
>>>>>>> 4 4 0 0
>>>>>>>
>>>>>>> mail-trash:trash_local_t 0 64 8628
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
>>>>>>> 16540 0 0 0 0
>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-changelog:changelog_local_t 0 64 116
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
>>>>>>> 79204 4 0 0
>>>>>>> mail-locks:pl_local_t 0 32 148
>>>>>>> 6812757 4 0 0
>>>>>>> mail-upcall:upcall_local_t 0 512 108
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-marker:marker_local_t 0 128 332
>>>>>>> 64980 3 0 0
>>>>>>> mail-quota:quota_local_t 0 64 476
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-server:rpcsvc_request_t 0 512 2828
>>>>>>> 45462533 34 0 0
>>>>>>> glusterfs:struct saved_frame 0 8 124
>>>>>>>
>>>>>>> 2 2 0 0
>>>>>>>
>>>>>>> glusterfs:struct rpc_req 0 8 588
>>>>>>>
>>>>>>> 2 2 0 0
>>>>>>>
>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
>>>>>>>
>>>>>>> 2 1 0 0
>>>>>>>
>>>>>>> glusterfs:log_buf_t 5 251 140
>>>>>>> 3452 6 0 0
>>>>>>> glusterfs:data_t 242 16141 52
>>>>>>> 480115498 664 0 0
>>>>>>> glusterfs:data_pair_t 230 16153 68
>>>>>>> 179483528 275 0 0
>>>>>>> glusterfs:dict_t 23 4073 140
>>>>>>> 303751675 627 0 0
>>>>>>> glusterfs:call_stub_t 0 1024 3764
>>>>>>> 45290655 34 0 0
>>>>>>> glusterfs:call_stack_t 1 1023 1708
>>>>>>> 43598469 34 0 0
>>>>>>> glusterfs:call_frame_t 1 4095 172
>>>>>>> 336219655 184 0 0
>>>>>>> ----------------------------------------------
>>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>>> Mallinfo
>>>>>>> --------
>>>>>>> Arena : 38174720
>>>>>>> Ordblks : 9041
>>>>>>> Smblks : 507
>>>>>>> Hblks : 21
>>>>>>> Hblkhd : 30515200
>>>>>>> Usmblks : 0
>>>>>>> Fsmblks : 51712
>>>>>>> Uordblks : 19415008
>>>>>>> Fordblks : 18759712
>>>>>>> Keepcost : 114848
>>>>>>>
>>>>>>> Mempool Stats
>>>>>>> -------------
>>>>>>> Name HotCount ColdCount PaddedSizeof
>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
>>>>>>> ---- -------- --------- ------------
>>>>>>> ---------- -------- -------- ------------
>>>>>>> mail-server:fd_t 0 1024 108
>>>>>>> 2373075 133 0 0
>>>>>>> mail-server:dentry_t 14114 2270 84
>>>>>>> 3513654 16384 2300 267
>>>>>>> mail-server:inode_t 16374 10 156
>>>>>>> 6766642 16384 194635 1279
>>>>>>> mail-trash:fd_t 0 1024 108
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-trash:dentry_t 0 32768 84
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-trash:inode_t 4 32764 156
>>>>>>>
>>>>>>> 4 4 0 0
>>>>>>>
>>>>>>> mail-trash:trash_local_t 0 64 8628
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
>>>>>>> 16540 0 0 0 0
>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-changelog:changelog_local_t 0 64 116
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
>>>>>>> 71354 4 0 0
>>>>>>> mail-locks:pl_local_t 0 32 148
>>>>>>> 8135032 4 0 0
>>>>>>> mail-upcall:upcall_local_t 0 512 108
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-marker:marker_local_t 0 128 332
>>>>>>> 65005 3 0 0
>>>>>>> mail-quota:quota_local_t 0 64 476
>>>>>>>
>>>>>>> 0 0 0 0
>>>>>>>
>>>>>>> mail-server:rpcsvc_request_t 0 512 2828
>>>>>>> 12882393 30 0 0
>>>>>>> glusterfs:struct saved_frame 0 8 124
>>>>>>>
>>>>>>> 2 2 0 0
>>>>>>>
>>>>>>> glusterfs:struct rpc_req 0 8 588
>>>>>>>
>>>>>>> 2 2 0 0
>>>>>>>
>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
>>>>>>>
>>>>>>> 2 1 0 0
>>>>>>>
>>>>>>> glusterfs:log_buf_t 5 251 140
>>>>>>> 3443 6 0 0
>>>>>>> glusterfs:data_t 242 16141 52
>>>>>>> 138743429 290 0 0
>>>>>>> glusterfs:data_pair_t 230 16153 68
>>>>>>> 126649864 270 0 0
>>>>>>> glusterfs:dict_t 23 4073 140
>>>>>>> 20356289 63 0 0
>>>>>>> glusterfs:call_stub_t 0 1024 3764
>>>>>>> 13678560 31 0 0
>>>>>>> glusterfs:call_stack_t 1 1023 1708
>>>>>>> 11011561 30 0 0
>>>>>>> glusterfs:call_frame_t 1 4095 172
>>>>>>> 125764190 193 0 0
>>>>>>> ----------------------------------------------
>>>>>>> ===
>>>>>>>
>>>>>>> So, my questions are:
>>>>>>>
>>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage?
>>>>>>> 2) what one should do to prevent client high loadavg because of high
>>>>>>> iowait because of multiple concurrent volume users?
>>>>>>>
>>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3,
>>>>>>> GlusterFS client version is 3.7.4.
>>>>>>>
>>>>>>> Any additional info needed?
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list