[Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

Thu Dec 31 08:39:03 UTC 2015

On 12/28/2015 02:32 PM, Soumya Koduri wrote:
>
>
> ----- Original Message -----
>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> To: "Oleksandr Natalenko" <oleksandr at natalenko.name>, "Soumya Koduri" <skoduri at redhat.com>
>> Cc: gluster-users at gluster.org, gluster-devel at gluster.org
>> Sent: Monday, December 28, 2015 9:32:07 AM
>> Subject: Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client
>>
>>
>>
>> On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote:
>>> Also, here is valgrind output with our custom tool, that does GlusterFS
>>> volume
>>> traversing (with simple stats) just like find tool. In this case
>>> NFS-Ganesha
>>> is not used.
>>>
>>> https://gist.github.com/e4602a50d3c98f7a2766
>> hi Oleksandr,
>>         I went through the code. Both NFS Ganesha and the custom tool use
>> gfapi and the leak is stemming from that. I am not very familiar with
>> this part of code but there seems to be one inode_unref() that is
>> missing in failure path of resolution. Not sure if that is corresponding
>> to the leaks.
>>
>> Soumya,
>>          Could this be the issue? review.gluster.org seems to be down. So
>> couldn't send the patch. Please ping me on IRC.
>> diff --git a/api/src/glfs-resolve.c b/api/src/glfs-resolve.c
>> index b5efcba..52b538b 100644
>> --- a/api/src/glfs-resolve.c
>> +++ b/api/src/glfs-resolve.c
>> @@ -467,9 +467,11 @@ priv_glfs_resolve_at (struct glfs *fs, xlator_t
>> *subvol, inode_t *at,
>>                   }
>>           }
>>
>> -       if (parent && next_component)
>> +       if (parent && next_component) {
>> +               inode_unref (parent);
>> +               parent = NULL;
>>                   /* resolution failed mid-way */
>>                   goto out;
>> +        }
>>
>>           /* At this point, all components up to the last parent directory
>>              have been resolved successfully (@parent). Resolution of
>> basename
>>
> yes. This could be one of the reasons. There are few leaks with respect to inode references in gfAPI. See below.
>
>
> On GlusterFS side, looks like majority of the leaks are related to inodes and their contexts. Possible reasons which I can think of are:
>
> 1) When there is a graph switch, old inode table and their entries are not purged (this is a known issue). There was an effort put to fix this issue. But I think it had other side-effects and hence not been applied. Maybe we should revive those changes again.
>
> 2) With regard to above, old entries can be purged in case if any request comes with the reference to old inode (as part of 'glfs_resolve_inode'), provided their reference counts are properly decremented. But this is not happening at the moment in gfapi.
>
> 3) Applications should hold and release their reference as needed and required. There are certain fixes needed in this area as well (including the fix provided by Pranith above).
>
>  From code-inspection, have made changes to fix few leaks of case (2) & (3) with respect to gfAPI.
> 	http://review.gluster.org/#/c/13096 (yet to test the changes)
>
> I haven't yet narrowed down any suspects pertaining to only NFS-Ganesha. Will re-check and update.
>

I tried similar tests but with smaller set of files. I could see the 
inode_ctx leak even without graph switches involved. I suspect that 
could be because valgrind checks for memory leaks during the exit of the 
program. We call 'glfs_fini()' to cleanup the memory  being used by 
gfapi during exit. Those inode_ctx leaks are result of some inodes being 
left during inode_table cleanup. I have submitted below patch to address 
this issue.

http://review.gluster.org/13125

However this shall help only if there are volume un-exports being 
involved or program being exited. It still doesn't address the actual 
RAM being consumed by the application when active.

Thanks,
Soumya

> Thanks,
> Soumya
>
>
>> Pranith
>>>
>>> One may see GlusterFS-related leaks here as well.
>>>
>>> On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
>>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
>>>>> Another addition: it seems to be GlusterFS API library memory leak
>>>>> because NFS-Ganesha also consumes huge amount of memory while doing
>>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
>>>>> usage:
>>>>>
>>>>> ===
>>>>> root      5416 34.2 78.5 2047176 1480552 ?     Ssl  12:02 117:54
>>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
>>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT
>>>>> ===
>>>>>
>>>>> 1.4G is too much for simple stat() :(.
>>>>>
>>>>> Ideas?
>>>> nfs-ganesha also has cache layer which can scale to millions of entries
>>>> depending on the number of files/directories being looked upon. However
>>>> there are parameters to tune it. So either try stat with few entries or
>>>> add below block in nfs-ganesha.conf file, set low limits and check the
>>>> difference. That may help us narrow down how much memory actually
>>>> consumed by core nfs-ganesha and gfAPI.
>>>>
>>>> CACHEINODE {
>>>> 	Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
>>>> 	Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no.
>>>> of entries in the cache.
>>>> }
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>> 24.12.2015 16:32, Oleksandr Natalenko написав:
>>>>>> Still actual issue for 3.7.6. Any suggestions?
>>>>>>
>>>>>> 24.09.2015 10:14, Oleksandr Natalenko написав:
>>>>>>> In our GlusterFS deployment we've encountered something like memory
>>>>>>> leak in GlusterFS FUSE client.
>>>>>>>
>>>>>>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
>>>>>>> maildir format). Here is inode stats for both bricks and mountpoint:
>>>>>>>
>>>>>>> ===
>>>>>>> Brick 1 (Server 1):
>>>>>>>
>>>>>>> Filesystem                                             Inodes    IUsed
>>>>>>>
>>>>>>>        IFree IUse% Mounted on
>>>>>>>
>>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail                   578768144 10954918
>>>>>>>
>>>>>>>    567813226    2% /bricks/r6sdLV08_vd1_mail
>>>>>>>
>>>>>>> Brick 2 (Server 2):
>>>>>>>
>>>>>>> Filesystem                                             Inodes    IUsed
>>>>>>>
>>>>>>>        IFree IUse% Mounted on
>>>>>>>
>>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail                   578767984 10954913
>>>>>>>
>>>>>>>    567813071    2% /bricks/r6sdLV07_vd0_mail
>>>>>>>
>>>>>>> Mountpoint (Server 3):
>>>>>>>
>>>>>>> Filesystem                              Inodes    IUsed      IFree
>>>>>>> IUse% Mounted on
>>>>>>> glusterfs.xxx:mail                   578767760 10954915  567812845
>>>>>>> 2% /var/spool/mail/virtual
>>>>>>> ===
>>>>>>>
>>>>>>> glusterfs.xxx domain has two A records for both Server 1 and Server 2.
>>>>>>>
>>>>>>> Here is volume info:
>>>>>>>
>>>>>>> ===
>>>>>>> Volume Name: mail
>>>>>>> Type: Replicate
>>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>>> Options Reconfigured:
>>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
>>>>>>> features.cache-invalidation-timeout: 10
>>>>>>> performance.stat-prefetch: off
>>>>>>> performance.quick-read: on
>>>>>>> performance.read-ahead: off
>>>>>>> performance.flush-behind: on
>>>>>>> performance.write-behind: on
>>>>>>> performance.io-thread-count: 4
>>>>>>> performance.cache-max-file-size: 1048576
>>>>>>> performance.cache-size: 67108864
>>>>>>> performance.readdir-ahead: off
>>>>>>> ===
>>>>>>>
>>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client
>>>>>>> process begins to consume huge amount of RAM:
>>>>>>>
>>>>>>> ===
>>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail
>>>>>>> root     28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
>>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
>>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail
>>>>>>> /var/spool/mail/virtual
>>>>>>> ===
>>>>>>>
>>>>>>> That is, ~15 GiB of RAM.
>>>>>>>
>>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
>>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for
>>>>>>> glusterfs client process.
>>>>>>>
>>>>>>> Mounting same share via NFS works just fine. Also, we have much less
>>>>>>> iowait and loadavg on client side with NFS.
>>>>>>>
>>>>>>> Also, we've tried to change IO threads count and cache size in order
>>>>>>> to limit memory usage with no luck. As you can see, total cache size
>>>>>>> is 4×64==256 MiB (compare to 15 GiB).
>>>>>>>
>>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
>>>>>>> help as well.
>>>>>>>
>>>>>>> Here are volume memory stats:
>>>>>>>
>>>>>>> ===
>>>>>>> Memory status for volume : mail
>>>>>>> ----------------------------------------------
>>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>>> Mallinfo
>>>>>>> --------
>>>>>>> Arena    : 36859904
>>>>>>> Ordblks  : 10357
>>>>>>> Smblks   : 519
>>>>>>> Hblks    : 21
>>>>>>> Hblkhd   : 30515200
>>>>>>> Usmblks  : 0
>>>>>>> Fsmblks  : 53440
>>>>>>> Uordblks : 18604144
>>>>>>> Fordblks : 18255760
>>>>>>> Keepcost : 114112
>>>>>>>
>>>>>>> Mempool Stats
>>>>>>> -------------
>>>>>>> Name                            HotCount ColdCount PaddedSizeof
>>>>>>> AllocCount MaxAlloc   Misses Max-StdAlloc
>>>>>>> ----                            -------- --------- ------------
>>>>>>> ---------- -------- -------- ------------
>>>>>>> mail-server:fd_t                       0      1024          108
>>>>>>> 30773120      137        0            0
>>>>>>> mail-server:dentry_t               16110       274           84
>>>>>>> 235676148    16384  1106499         1152
>>>>>>> mail-server:inode_t                16363        21          156
>>>>>>> 237216876    16384  1876651         1169
>>>>>>> mail-trash:fd_t                        0      1024          108
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-trash:dentry_t                    0     32768           84
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-trash:inode_t                     4     32764          156
>>>>>>>
>>>>>>>     4        4        0            0
>>>>>>>
>>>>>>> mail-trash:trash_local_t               0        64         8628
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-changetimerecorder:gf_ctr_local_t         0        64
>>>>>>> 16540          0        0        0            0
>>>>>>> mail-changelog:rpcsvc_request_t         0         8         2828
>>>>>>>
>>>>>>>      0        0        0            0
>>>>>>>
>>>>>>> mail-changelog:changelog_local_t         0        64          116
>>>>>>>
>>>>>>>       0        0        0            0
>>>>>>>
>>>>>>> mail-bitrot-stub:br_stub_local_t         0       512           84
>>>>>>> 79204        4        0            0
>>>>>>> mail-locks:pl_local_t                  0        32          148
>>>>>>> 6812757        4        0            0
>>>>>>> mail-upcall:upcall_local_t             0       512          108
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-marker:marker_local_t             0       128          332
>>>>>>> 64980        3        0            0
>>>>>>> mail-quota:quota_local_t               0        64          476
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-server:rpcsvc_request_t           0       512         2828
>>>>>>> 45462533       34        0            0
>>>>>>> glusterfs:struct saved_frame           0         8          124
>>>>>>>
>>>>>>>     2        2        0            0
>>>>>>>
>>>>>>> glusterfs:struct rpc_req               0         8          588
>>>>>>>
>>>>>>>     2        2        0            0
>>>>>>>
>>>>>>> glusterfs:rpcsvc_request_t             1         7         2828
>>>>>>>
>>>>>>>     2        1        0            0
>>>>>>>
>>>>>>> glusterfs:log_buf_t                    5       251          140
>>>>>>> 3452        6        0            0
>>>>>>> glusterfs:data_t                     242     16141           52
>>>>>>> 480115498      664        0            0
>>>>>>> glusterfs:data_pair_t                230     16153           68
>>>>>>> 179483528      275        0            0
>>>>>>> glusterfs:dict_t                      23      4073          140
>>>>>>> 303751675      627        0            0
>>>>>>> glusterfs:call_stub_t                  0      1024         3764
>>>>>>> 45290655       34        0            0
>>>>>>> glusterfs:call_stack_t                 1      1023         1708
>>>>>>> 43598469       34        0            0
>>>>>>> glusterfs:call_frame_t                 1      4095          172
>>>>>>> 336219655      184        0            0
>>>>>>> ----------------------------------------------
>>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>>> Mallinfo
>>>>>>> --------
>>>>>>> Arena    : 38174720
>>>>>>> Ordblks  : 9041
>>>>>>> Smblks   : 507
>>>>>>> Hblks    : 21
>>>>>>> Hblkhd   : 30515200
>>>>>>> Usmblks  : 0
>>>>>>> Fsmblks  : 51712
>>>>>>> Uordblks : 19415008
>>>>>>> Fordblks : 18759712
>>>>>>> Keepcost : 114848
>>>>>>>
>>>>>>> Mempool Stats
>>>>>>> -------------
>>>>>>> Name                            HotCount ColdCount PaddedSizeof
>>>>>>> AllocCount MaxAlloc   Misses Max-StdAlloc
>>>>>>> ----                            -------- --------- ------------
>>>>>>> ---------- -------- -------- ------------
>>>>>>> mail-server:fd_t                       0      1024          108
>>>>>>> 2373075      133        0            0
>>>>>>> mail-server:dentry_t               14114      2270           84
>>>>>>> 3513654    16384     2300          267
>>>>>>> mail-server:inode_t                16374        10          156
>>>>>>> 6766642    16384   194635         1279
>>>>>>> mail-trash:fd_t                        0      1024          108
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-trash:dentry_t                    0     32768           84
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-trash:inode_t                     4     32764          156
>>>>>>>
>>>>>>>     4        4        0            0
>>>>>>>
>>>>>>> mail-trash:trash_local_t               0        64         8628
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-changetimerecorder:gf_ctr_local_t         0        64
>>>>>>> 16540          0        0        0            0
>>>>>>> mail-changelog:rpcsvc_request_t         0         8         2828
>>>>>>>
>>>>>>>      0        0        0            0
>>>>>>>
>>>>>>> mail-changelog:changelog_local_t         0        64          116
>>>>>>>
>>>>>>>       0        0        0            0
>>>>>>>
>>>>>>> mail-bitrot-stub:br_stub_local_t         0       512           84
>>>>>>> 71354        4        0            0
>>>>>>> mail-locks:pl_local_t                  0        32          148
>>>>>>> 8135032        4        0            0
>>>>>>> mail-upcall:upcall_local_t             0       512          108
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-marker:marker_local_t             0       128          332
>>>>>>> 65005        3        0            0
>>>>>>> mail-quota:quota_local_t               0        64          476
>>>>>>>
>>>>>>>     0        0        0            0
>>>>>>>
>>>>>>> mail-server:rpcsvc_request_t           0       512         2828
>>>>>>> 12882393       30        0            0
>>>>>>> glusterfs:struct saved_frame           0         8          124
>>>>>>>
>>>>>>>     2        2        0            0
>>>>>>>
>>>>>>> glusterfs:struct rpc_req               0         8          588
>>>>>>>
>>>>>>>     2        2        0            0
>>>>>>>
>>>>>>> glusterfs:rpcsvc_request_t             1         7         2828
>>>>>>>
>>>>>>>     2        1        0            0
>>>>>>>
>>>>>>> glusterfs:log_buf_t                    5       251          140
>>>>>>> 3443        6        0            0
>>>>>>> glusterfs:data_t                     242     16141           52
>>>>>>> 138743429      290        0            0
>>>>>>> glusterfs:data_pair_t                230     16153           68
>>>>>>> 126649864      270        0            0
>>>>>>> glusterfs:dict_t                      23      4073          140
>>>>>>> 20356289       63        0            0
>>>>>>> glusterfs:call_stub_t                  0      1024         3764
>>>>>>> 13678560       31        0            0
>>>>>>> glusterfs:call_stack_t                 1      1023         1708
>>>>>>> 11011561       30        0            0
>>>>>>> glusterfs:call_frame_t                 1      4095          172
>>>>>>> 125764190      193        0            0
>>>>>>> ----------------------------------------------
>>>>>>> ===
>>>>>>>
>>>>>>> So, my questions are:
>>>>>>>
>>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage?
>>>>>>> 2) what one should do to prevent client high loadavg because of high
>>>>>>> iowait because of multiple concurrent volume users?
>>>>>>>
>>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3,
>>>>>>> GlusterFS client version is 3.7.4.
>>>>>>>
>>>>>>> Any additional info needed?
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>