[Gluster-users] Memory leak in GlusterFS FUSE client
Oleksandr Natalenko
oleksandr at natalenko.name
Tue Jan 5 12:26:08 UTC 2016
Unfortunately, both patches didn't make any difference for me.
I've patched 3.7.6 with both patches, recompiled and installed patched
GlusterFS package on client side and mounted volume with ~2M of files.
The I performed usual tree traverse with simple "find".
Memory RES value went from ~130M at the moment of mounting to ~1.5G
after traversing the volume for ~40 mins. Valgrind log still shows lots
of leaks. Here it is:
https://gist.github.com/56906ca6e657c4ffa4a1
Ideas?
05.01.2016 12:31, Soumya Koduri написав:
> I tried to debug the inode* related leaks and seen some improvements
> after applying the below patches when ran the same test (but will
> smaller load). Could you please apply those patches & confirm the
> same?
>
> a) http://review.gluster.org/13125
>
> This will fix the inodes & their ctx related leaks during unexport and
> the program exit. Please check the valgrind output after applying the
> patch. It should not list any inodes related memory as lost.
>
> b) http://review.gluster.org/13096
>
> The reason the change in Entries_HWMARK (in your earlier mail) dint
> have much effect is that the inode_nlookup count doesn't become zero
> for those handles/inodes being closed by ganesha. Hence those inodes
> shall get added to inode lru list instead of purge list which shall
> get forcefully purged only when the number of gfapi inode table
> entries reaches its limit (which is 137012).
>
> This patch fixes those 'nlookup' counts. Please apply this patch and
> reduce 'Entries_HWMARK' to much lower value and check if it decreases
> the in-memory being consumed by ganesha process while being active.
>
> CACHEINODE {
> Entries_HWMark = 500;
> }
>
>
> Note: I see an issue with nfs-ganesha during exit when the option
> 'Entries_HWMARK' gets changed. This is not related to any of the above
> patches (or rather Gluster) and I am currently debugging it.
>
> Thanks,
> Soumya
>
>
> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:
>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096
>>
>> Before find . -type f:
>>
>> root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00
>> /usr/bin/
>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>> NIV_EVENT
>>
>> After:
>>
>> root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39
>> /usr/bin/
>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>> NIV_EVENT
>>
>> ~250M leak.
>>
>> 2. test with default values (after ganesha restart)
>>
>> Before:
>>
>> root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00
>> /usr/bin/
>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>> NIV_EVENT
>>
>> After:
>>
>> root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40
>> /usr/bin/
>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
>> NIV_EVENT
>>
>> ~159M leak.
>>
>> No reasonable correlation detected. Second test was finished much
>> faster than
>> first (I guess, server-side GlusterFS cache or server kernel page
>> cache is the
>> cause).
>>
>> There are ~1.8M files on this test volume.
>>
>> On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
>>>> Another addition: it seems to be GlusterFS API library memory leak
>>>> because NFS-Ganesha also consumes huge amount of memory while doing
>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is
>>>> memory
>>>> usage:
>>>>
>>>> ===
>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54
>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT
>>>> ===
>>>>
>>>> 1.4G is too much for simple stat() :(.
>>>>
>>>> Ideas?
>>>
>>> nfs-ganesha also has cache layer which can scale to millions of
>>> entries
>>> depending on the number of files/directories being looked upon.
>>> However
>>> there are parameters to tune it. So either try stat with few entries
>>> or
>>> add below block in nfs-ganesha.conf file, set low limits and check
>>> the
>>> difference. That may help us narrow down how much memory actually
>>> consumed by core nfs-ganesha and gfAPI.
>>>
>>> CACHEINODE {
>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache
>>> size
>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max
>>> no.
>>> of entries in the cache.
>>> }
>>>
>>> Thanks,
>>> Soumya
>>>
>>>> 24.12.2015 16:32, Oleksandr Natalenko написав:
>>>>> Still actual issue for 3.7.6. Any suggestions?
>>>>>
>>>>> 24.09.2015 10:14, Oleksandr Natalenko написав:
>>>>>> In our GlusterFS deployment we've encountered something like
>>>>>> memory
>>>>>> leak in GlusterFS FUSE client.
>>>>>>
>>>>>> We use replicated (×2) GlusterFS volume to store mail
>>>>>> (exim+dovecot,
>>>>>> maildir format). Here is inode stats for both bricks and
>>>>>> mountpoint:
>>>>>>
>>>>>> ===
>>>>>> Brick 1 (Server 1):
>>>>>>
>>>>>> Filesystem Inodes
>>>>>> IUsed
>>>>>>
>>>>>> IFree IUse% Mounted on
>>>>>>
>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144
>>>>>> 10954918
>>>>>>
>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail
>>>>>>
>>>>>> Brick 2 (Server 2):
>>>>>>
>>>>>> Filesystem Inodes
>>>>>> IUsed
>>>>>>
>>>>>> IFree IUse% Mounted on
>>>>>>
>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984
>>>>>> 10954913
>>>>>>
>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail
>>>>>>
>>>>>> Mountpoint (Server 3):
>>>>>>
>>>>>> Filesystem Inodes IUsed IFree
>>>>>> IUse% Mounted on
>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845
>>>>>> 2% /var/spool/mail/virtual
>>>>>> ===
>>>>>>
>>>>>> glusterfs.xxx domain has two A records for both Server 1 and
>>>>>> Server 2.
>>>>>>
>>>>>> Here is volume info:
>>>>>>
>>>>>> ===
>>>>>> Volume Name: mail
>>>>>> Type: Replicate
>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
>>>>>> Status: Started
>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>> Options Reconfigured:
>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
>>>>>> features.cache-invalidation-timeout: 10
>>>>>> performance.stat-prefetch: off
>>>>>> performance.quick-read: on
>>>>>> performance.read-ahead: off
>>>>>> performance.flush-behind: on
>>>>>> performance.write-behind: on
>>>>>> performance.io-thread-count: 4
>>>>>> performance.cache-max-file-size: 1048576
>>>>>> performance.cache-size: 67108864
>>>>>> performance.readdir-ahead: off
>>>>>> ===
>>>>>>
>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs
>>>>>> client
>>>>>> process begins to consume huge amount of RAM:
>>>>>>
>>>>>> ===
>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail
>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05
>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail
>>>>>> /var/spool/mail/virtual
>>>>>> ===
>>>>>>
>>>>>> That is, ~15 GiB of RAM.
>>>>>>
>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2
>>>>>> or 3
>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer
>>>>>> for
>>>>>> glusterfs client process.
>>>>>>
>>>>>> Mounting same share via NFS works just fine. Also, we have much
>>>>>> less
>>>>>> iowait and loadavg on client side with NFS.
>>>>>>
>>>>>> Also, we've tried to change IO threads count and cache size in
>>>>>> order
>>>>>> to limit memory usage with no luck. As you can see, total cache
>>>>>> size
>>>>>> is 4×64==256 MiB (compare to 15 GiB).
>>>>>>
>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead
>>>>>> didn't
>>>>>> help as well.
>>>>>>
>>>>>> Here are volume memory stats:
>>>>>>
>>>>>> ===
>>>>>> Memory status for volume : mail
>>>>>> ----------------------------------------------
>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
>>>>>> Mallinfo
>>>>>> --------
>>>>>> Arena : 36859904
>>>>>> Ordblks : 10357
>>>>>> Smblks : 519
>>>>>> Hblks : 21
>>>>>> Hblkhd : 30515200
>>>>>> Usmblks : 0
>>>>>> Fsmblks : 53440
>>>>>> Uordblks : 18604144
>>>>>> Fordblks : 18255760
>>>>>> Keepcost : 114112
>>>>>>
>>>>>> Mempool Stats
>>>>>> -------------
>>>>>> Name HotCount ColdCount PaddedSizeof
>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
>>>>>> ---- -------- --------- ------------
>>>>>> ---------- -------- -------- ------------
>>>>>> mail-server:fd_t 0 1024 108
>>>>>> 30773120 137 0 0
>>>>>> mail-server:dentry_t 16110 274 84
>>>>>> 235676148 16384 1106499 1152
>>>>>> mail-server:inode_t 16363 21 156
>>>>>> 237216876 16384 1876651 1169
>>>>>> mail-trash:fd_t 0 1024 108
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-trash:dentry_t 0 32768 84
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-trash:inode_t 4 32764 156
>>>>>>
>>>>>> 4 4 0 0
>>>>>>
>>>>>> mail-trash:trash_local_t 0 64 8628
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
>>>>>> 16540 0 0 0 0
>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-changelog:changelog_local_t 0 64 116
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
>>>>>> 79204 4 0 0
>>>>>> mail-locks:pl_local_t 0 32 148
>>>>>> 6812757 4 0 0
>>>>>> mail-upcall:upcall_local_t 0 512 108
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-marker:marker_local_t 0 128 332
>>>>>> 64980 3 0 0
>>>>>> mail-quota:quota_local_t 0 64 476
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-server:rpcsvc_request_t 0 512 2828
>>>>>> 45462533 34 0 0
>>>>>> glusterfs:struct saved_frame 0 8 124
>>>>>>
>>>>>> 2 2 0 0
>>>>>>
>>>>>> glusterfs:struct rpc_req 0 8 588
>>>>>>
>>>>>> 2 2 0 0
>>>>>>
>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
>>>>>>
>>>>>> 2 1 0 0
>>>>>>
>>>>>> glusterfs:log_buf_t 5 251 140
>>>>>> 3452 6 0 0
>>>>>> glusterfs:data_t 242 16141 52
>>>>>> 480115498 664 0 0
>>>>>> glusterfs:data_pair_t 230 16153 68
>>>>>> 179483528 275 0 0
>>>>>> glusterfs:dict_t 23 4073 140
>>>>>> 303751675 627 0 0
>>>>>> glusterfs:call_stub_t 0 1024 3764
>>>>>> 45290655 34 0 0
>>>>>> glusterfs:call_stack_t 1 1023 1708
>>>>>> 43598469 34 0 0
>>>>>> glusterfs:call_frame_t 1 4095 172
>>>>>> 336219655 184 0 0
>>>>>> ----------------------------------------------
>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
>>>>>> Mallinfo
>>>>>> --------
>>>>>> Arena : 38174720
>>>>>> Ordblks : 9041
>>>>>> Smblks : 507
>>>>>> Hblks : 21
>>>>>> Hblkhd : 30515200
>>>>>> Usmblks : 0
>>>>>> Fsmblks : 51712
>>>>>> Uordblks : 19415008
>>>>>> Fordblks : 18759712
>>>>>> Keepcost : 114848
>>>>>>
>>>>>> Mempool Stats
>>>>>> -------------
>>>>>> Name HotCount ColdCount PaddedSizeof
>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
>>>>>> ---- -------- --------- ------------
>>>>>> ---------- -------- -------- ------------
>>>>>> mail-server:fd_t 0 1024 108
>>>>>> 2373075 133 0 0
>>>>>> mail-server:dentry_t 14114 2270 84
>>>>>> 3513654 16384 2300 267
>>>>>> mail-server:inode_t 16374 10 156
>>>>>> 6766642 16384 194635 1279
>>>>>> mail-trash:fd_t 0 1024 108
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-trash:dentry_t 0 32768 84
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-trash:inode_t 4 32764 156
>>>>>>
>>>>>> 4 4 0 0
>>>>>>
>>>>>> mail-trash:trash_local_t 0 64 8628
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
>>>>>> 16540 0 0 0 0
>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-changelog:changelog_local_t 0 64 116
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
>>>>>> 71354 4 0 0
>>>>>> mail-locks:pl_local_t 0 32 148
>>>>>> 8135032 4 0 0
>>>>>> mail-upcall:upcall_local_t 0 512 108
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-marker:marker_local_t 0 128 332
>>>>>> 65005 3 0 0
>>>>>> mail-quota:quota_local_t 0 64 476
>>>>>>
>>>>>> 0 0 0 0
>>>>>>
>>>>>> mail-server:rpcsvc_request_t 0 512 2828
>>>>>> 12882393 30 0 0
>>>>>> glusterfs:struct saved_frame 0 8 124
>>>>>>
>>>>>> 2 2 0 0
>>>>>>
>>>>>> glusterfs:struct rpc_req 0 8 588
>>>>>>
>>>>>> 2 2 0 0
>>>>>>
>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
>>>>>>
>>>>>> 2 1 0 0
>>>>>>
>>>>>> glusterfs:log_buf_t 5 251 140
>>>>>> 3443 6 0 0
>>>>>> glusterfs:data_t 242 16141 52
>>>>>> 138743429 290 0 0
>>>>>> glusterfs:data_pair_t 230 16153 68
>>>>>> 126649864 270 0 0
>>>>>> glusterfs:dict_t 23 4073 140
>>>>>> 20356289 63 0 0
>>>>>> glusterfs:call_stub_t 0 1024 3764
>>>>>> 13678560 31 0 0
>>>>>> glusterfs:call_stack_t 1 1023 1708
>>>>>> 11011561 30 0 0
>>>>>> glusterfs:call_frame_t 1 4095 172
>>>>>> 125764190 193 0 0
>>>>>> ----------------------------------------------
>>>>>> ===
>>>>>>
>>>>>> So, my questions are:
>>>>>>
>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage?
>>>>>> 2) what one should do to prevent client high loadavg because of
>>>>>> high
>>>>>> iowait because of multiple concurrent volume users?
>>>>>>
>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3,
>>>>>> GlusterFS client version is 3.7.4.
>>>>>>
>>>>>> Any additional info needed?
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
More information about the Gluster-users
mailing list