[Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client
Oleksandr Natalenko
oleksandr at natalenko.name
Tue Jan 5 17:46:31 UTC 2016
Correct, I used FUSE mount. Shouldn't gfapi be used by FUSE mount helper (/
usr/bin/glusterfs)?
On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:
> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:
> > Unfortunately, both patches didn't make any difference for me.
> >
> > I've patched 3.7.6 with both patches, recompiled and installed patched
> > GlusterFS package on client side and mounted volume with ~2M of files.
> > The I performed usual tree traverse with simple "find".
> >
> > Memory RES value went from ~130M at the moment of mounting to ~1.5G
> > after traversing the volume for ~40 mins. Valgrind log still shows lots
> > of leaks. Here it is:
> >
> > https://gist.github.com/56906ca6e657c4ffa4a1
>
> Looks like you had done fuse mount. The patches which I have pasted
> below apply to gfapi/nfs-ganesha applications.
>
> Also, to resolve the nfs-ganesha issue which I had mentioned below (in
> case if Entries_HWMARK option gets changed), I have posted below fix -
> https://review.gerrithub.io/#/c/258687
>
> Thanks,
> Soumya
>
> > Ideas?
> >
> > 05.01.2016 12:31, Soumya Koduri написав:
> >> I tried to debug the inode* related leaks and seen some improvements
> >> after applying the below patches when ran the same test (but will
> >> smaller load). Could you please apply those patches & confirm the
> >> same?
> >>
> >> a) http://review.gluster.org/13125
> >>
> >> This will fix the inodes & their ctx related leaks during unexport and
> >> the program exit. Please check the valgrind output after applying the
> >> patch. It should not list any inodes related memory as lost.
> >>
> >> b) http://review.gluster.org/13096
> >>
> >> The reason the change in Entries_HWMARK (in your earlier mail) dint
> >> have much effect is that the inode_nlookup count doesn't become zero
> >> for those handles/inodes being closed by ganesha. Hence those inodes
> >> shall get added to inode lru list instead of purge list which shall
> >> get forcefully purged only when the number of gfapi inode table
> >> entries reaches its limit (which is 137012).
> >>
> >> This patch fixes those 'nlookup' counts. Please apply this patch and
> >> reduce 'Entries_HWMARK' to much lower value and check if it decreases
> >> the in-memory being consumed by ganesha process while being active.
> >>
> >> CACHEINODE {
> >>
> >> Entries_HWMark = 500;
> >>
> >> }
> >>
> >>
> >> Note: I see an issue with nfs-ganesha during exit when the option
> >> 'Entries_HWMARK' gets changed. This is not related to any of the above
> >> patches (or rather Gluster) and I am currently debugging it.
> >>
> >> Thanks,
> >> Soumya
> >>
> >> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:
> >>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096
> >>>
> >>> Before find . -type f:
> >>>
> >>> root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>>
> >>> After:
> >>>
> >>> root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>>
> >>> ~250M leak.
> >>>
> >>> 2. test with default values (after ganesha restart)
> >>>
> >>> Before:
> >>>
> >>> root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>>
> >>> After:
> >>>
> >>> root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>>
> >>> ~159M leak.
> >>>
> >>> No reasonable correlation detected. Second test was finished much
> >>> faster than
> >>> first (I guess, server-side GlusterFS cache or server kernel page
> >>> cache is the
> >>> cause).
> >>>
> >>> There are ~1.8M files on this test volume.
> >>>
> >>> On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
> >>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> >>>>> Another addition: it seems to be GlusterFS API library memory leak
> >>>>> because NFS-Ganesha also consumes huge amount of memory while doing
> >>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
> >>>>> usage:
> >>>>>
> >>>>> ===
> >>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54
> >>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> >>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT
> >>>>> ===
> >>>>>
> >>>>> 1.4G is too much for simple stat() :(.
> >>>>>
> >>>>> Ideas?
> >>>>
> >>>> nfs-ganesha also has cache layer which can scale to millions of entries
> >>>> depending on the number of files/directories being looked upon. However
> >>>> there are parameters to tune it. So either try stat with few entries or
> >>>> add below block in nfs-ganesha.conf file, set low limits and check the
> >>>> difference. That may help us narrow down how much memory actually
> >>>> consumed by core nfs-ganesha and gfAPI.
> >>>>
> >>>> CACHEINODE {
> >>>>
> >>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); #
> >>>>
> >>>> cache size
> >>>>
> >>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000);
> >>>>
> >>>> #Max no.
> >>>> of entries in the cache.
> >>>> }
> >>>>
> >>>> Thanks,
> >>>> Soumya
> >>>>
> >>>>> 24.12.2015 16:32, Oleksandr Natalenko написав:
> >>>>>> Still actual issue for 3.7.6. Any suggestions?
> >>>>>>
> >>>>>> 24.09.2015 10:14, Oleksandr Natalenko написав:
> >>>>>>> In our GlusterFS deployment we've encountered something like memory
> >>>>>>> leak in GlusterFS FUSE client.
> >>>>>>>
> >>>>>>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
> >>>>>>> maildir format). Here is inode stats for both bricks and mountpoint:
> >>>>>>>
> >>>>>>> ===
> >>>>>>> Brick 1 (Server 1):
> >>>>>>>
> >>>>>>> Filesystem Inodes IUsed
> >>>>>>>
> >>>>>>> IFree IUse% Mounted on
> >>>>>>>
> >>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144
> >>>>>>> 10954918
> >>>>>>>
> >>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail
> >>>>>>>
> >>>>>>> Brick 2 (Server 2):
> >>>>>>>
> >>>>>>> Filesystem Inodes IUsed
> >>>>>>>
> >>>>>>> IFree IUse% Mounted on
> >>>>>>>
> >>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984
> >>>>>>> 10954913
> >>>>>>>
> >>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail
> >>>>>>>
> >>>>>>> Mountpoint (Server 3):
> >>>>>>>
> >>>>>>> Filesystem Inodes IUsed IFree
> >>>>>>> IUse% Mounted on
> >>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845
> >>>>>>> 2% /var/spool/mail/virtual
> >>>>>>> ===
> >>>>>>>
> >>>>>>> glusterfs.xxx domain has two A records for both Server 1 and
> >>>>>>> Server 2.
> >>>>>>>
> >>>>>>> Here is volume info:
> >>>>>>>
> >>>>>>> ===
> >>>>>>> Volume Name: mail
> >>>>>>> Type: Replicate
> >>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
> >>>>>>> Status: Started
> >>>>>>> Number of Bricks: 1 x 2 = 2
> >>>>>>> Transport-type: tcp
> >>>>>>> Bricks:
> >>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>>>>>> Options Reconfigured:
> >>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
> >>>>>>> features.cache-invalidation-timeout: 10
> >>>>>>> performance.stat-prefetch: off
> >>>>>>> performance.quick-read: on
> >>>>>>> performance.read-ahead: off
> >>>>>>> performance.flush-behind: on
> >>>>>>> performance.write-behind: on
> >>>>>>> performance.io-thread-count: 4
> >>>>>>> performance.cache-max-file-size: 1048576
> >>>>>>> performance.cache-size: 67108864
> >>>>>>> performance.readdir-ahead: off
> >>>>>>> ===
> >>>>>>>
> >>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client
> >>>>>>> process begins to consume huge amount of RAM:
> >>>>>>>
> >>>>>>> ===
> >>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail
> >>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05
> >>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
> >>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail
> >>>>>>> /var/spool/mail/virtual
> >>>>>>> ===
> >>>>>>>
> >>>>>>> That is, ~15 GiB of RAM.
> >>>>>>>
> >>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2
> >>>>>>> or 3
> >>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for
> >>>>>>> glusterfs client process.
> >>>>>>>
> >>>>>>> Mounting same share via NFS works just fine. Also, we have much less
> >>>>>>> iowait and loadavg on client side with NFS.
> >>>>>>>
> >>>>>>> Also, we've tried to change IO threads count and cache size in order
> >>>>>>> to limit memory usage with no luck. As you can see, total cache size
> >>>>>>> is 4×64==256 MiB (compare to 15 GiB).
> >>>>>>>
> >>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead
> >>>>>>> didn't
> >>>>>>> help as well.
> >>>>>>>
> >>>>>>> Here are volume memory stats:
> >>>>>>>
> >>>>>>> ===
> >>>>>>> Memory status for volume : mail
> >>>>>>> ----------------------------------------------
> >>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>>>>>> Mallinfo
> >>>>>>> --------
> >>>>>>> Arena : 36859904
> >>>>>>> Ordblks : 10357
> >>>>>>> Smblks : 519
> >>>>>>> Hblks : 21
> >>>>>>> Hblkhd : 30515200
> >>>>>>> Usmblks : 0
> >>>>>>> Fsmblks : 53440
> >>>>>>> Uordblks : 18604144
> >>>>>>> Fordblks : 18255760
> >>>>>>> Keepcost : 114112
> >>>>>>>
> >>>>>>> Mempool Stats
> >>>>>>> -------------
> >>>>>>> Name HotCount ColdCount PaddedSizeof
> >>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
> >>>>>>> ---- -------- --------- ------------
> >>>>>>> ---------- -------- -------- ------------
> >>>>>>> mail-server:fd_t 0 1024 108
> >>>>>>> 30773120 137 0 0
> >>>>>>> mail-server:dentry_t 16110 274 84
> >>>>>>> 235676148 16384 1106499 1152
> >>>>>>> mail-server:inode_t 16363 21 156
> >>>>>>> 237216876 16384 1876651 1169
> >>>>>>> mail-trash:fd_t 0 1024 108
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-trash:dentry_t 0 32768 84
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-trash:inode_t 4 32764 156
> >>>>>>>
> >>>>>>> 4 4 0 0
> >>>>>>>
> >>>>>>> mail-trash:trash_local_t 0 64 8628
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
> >>>>>>> 16540 0 0 0 0
> >>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-changelog:changelog_local_t 0 64 116
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
> >>>>>>> 79204 4 0 0
> >>>>>>> mail-locks:pl_local_t 0 32 148
> >>>>>>> 6812757 4 0 0
> >>>>>>> mail-upcall:upcall_local_t 0 512 108
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-marker:marker_local_t 0 128 332
> >>>>>>> 64980 3 0 0
> >>>>>>> mail-quota:quota_local_t 0 64 476
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-server:rpcsvc_request_t 0 512 2828
> >>>>>>> 45462533 34 0 0
> >>>>>>> glusterfs:struct saved_frame 0 8 124
> >>>>>>>
> >>>>>>> 2 2 0 0
> >>>>>>>
> >>>>>>> glusterfs:struct rpc_req 0 8 588
> >>>>>>>
> >>>>>>> 2 2 0 0
> >>>>>>>
> >>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
> >>>>>>>
> >>>>>>> 2 1 0 0
> >>>>>>>
> >>>>>>> glusterfs:log_buf_t 5 251 140
> >>>>>>> 3452 6 0 0
> >>>>>>> glusterfs:data_t 242 16141 52
> >>>>>>> 480115498 664 0 0
> >>>>>>> glusterfs:data_pair_t 230 16153 68
> >>>>>>> 179483528 275 0 0
> >>>>>>> glusterfs:dict_t 23 4073 140
> >>>>>>> 303751675 627 0 0
> >>>>>>> glusterfs:call_stub_t 0 1024 3764
> >>>>>>> 45290655 34 0 0
> >>>>>>> glusterfs:call_stack_t 1 1023 1708
> >>>>>>> 43598469 34 0 0
> >>>>>>> glusterfs:call_frame_t 1 4095 172
> >>>>>>> 336219655 184 0 0
> >>>>>>> ----------------------------------------------
> >>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>>>>>> Mallinfo
> >>>>>>> --------
> >>>>>>> Arena : 38174720
> >>>>>>> Ordblks : 9041
> >>>>>>> Smblks : 507
> >>>>>>> Hblks : 21
> >>>>>>> Hblkhd : 30515200
> >>>>>>> Usmblks : 0
> >>>>>>> Fsmblks : 51712
> >>>>>>> Uordblks : 19415008
> >>>>>>> Fordblks : 18759712
> >>>>>>> Keepcost : 114848
> >>>>>>>
> >>>>>>> Mempool Stats
> >>>>>>> -------------
> >>>>>>> Name HotCount ColdCount PaddedSizeof
> >>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc
> >>>>>>> ---- -------- --------- ------------
> >>>>>>> ---------- -------- -------- ------------
> >>>>>>> mail-server:fd_t 0 1024 108
> >>>>>>> 2373075 133 0 0
> >>>>>>> mail-server:dentry_t 14114 2270 84
> >>>>>>> 3513654 16384 2300 267
> >>>>>>> mail-server:inode_t 16374 10 156
> >>>>>>> 6766642 16384 194635 1279
> >>>>>>> mail-trash:fd_t 0 1024 108
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-trash:dentry_t 0 32768 84
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-trash:inode_t 4 32764 156
> >>>>>>>
> >>>>>>> 4 4 0 0
> >>>>>>>
> >>>>>>> mail-trash:trash_local_t 0 64 8628
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64
> >>>>>>> 16540 0 0 0 0
> >>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-changelog:changelog_local_t 0 64 116
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84
> >>>>>>> 71354 4 0 0
> >>>>>>> mail-locks:pl_local_t 0 32 148
> >>>>>>> 8135032 4 0 0
> >>>>>>> mail-upcall:upcall_local_t 0 512 108
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-marker:marker_local_t 0 128 332
> >>>>>>> 65005 3 0 0
> >>>>>>> mail-quota:quota_local_t 0 64 476
> >>>>>>>
> >>>>>>> 0 0 0 0
> >>>>>>>
> >>>>>>> mail-server:rpcsvc_request_t 0 512 2828
> >>>>>>> 12882393 30 0 0
> >>>>>>> glusterfs:struct saved_frame 0 8 124
> >>>>>>>
> >>>>>>> 2 2 0 0
> >>>>>>>
> >>>>>>> glusterfs:struct rpc_req 0 8 588
> >>>>>>>
> >>>>>>> 2 2 0 0
> >>>>>>>
> >>>>>>> glusterfs:rpcsvc_request_t 1 7 2828
> >>>>>>>
> >>>>>>> 2 1 0 0
> >>>>>>>
> >>>>>>> glusterfs:log_buf_t 5 251 140
> >>>>>>> 3443 6 0 0
> >>>>>>> glusterfs:data_t 242 16141 52
> >>>>>>> 138743429 290 0 0
> >>>>>>> glusterfs:data_pair_t 230 16153 68
> >>>>>>> 126649864 270 0 0
> >>>>>>> glusterfs:dict_t 23 4073 140
> >>>>>>> 20356289 63 0 0
> >>>>>>> glusterfs:call_stub_t 0 1024 3764
> >>>>>>> 13678560 31 0 0
> >>>>>>> glusterfs:call_stack_t 1 1023 1708
> >>>>>>> 11011561 30 0 0
> >>>>>>> glusterfs:call_frame_t 1 4095 172
> >>>>>>> 125764190 193 0 0
> >>>>>>> ----------------------------------------------
> >>>>>>> ===
> >>>>>>>
> >>>>>>> So, my questions are:
> >>>>>>>
> >>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage?
> >>>>>>> 2) what one should do to prevent client high loadavg because of high
> >>>>>>> iowait because of multiple concurrent volume users?
> >>>>>>>
> >>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3,
> >>>>>>> GlusterFS client version is 3.7.4.
> >>>>>>>
> >>>>>>> Any additional info needed?
> >>>>>
> >>>>> _______________________________________________
> >>>>> Gluster-users mailing list
> >>>>> Gluster-users at gluster.org
> >>>>> http://www.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-devel
mailing list