[Gluster-users] Memory leak in GlusterFS FUSE client
Oleksandr Natalenko
oleksandr at natalenko.name
Fri Dec 25 23:15:43 UTC 2015
Also, here is valgrind output with our custom tool, that does GlusterFS volume
traversing (with simple stats) just like find tool. In this case NFS-Ganesha
is not used.
https://gist.github.com/e4602a50d3c98f7a2766
One may see GlusterFS-related leaks here as well.
On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> > Another addition: it seems to be GlusterFS API library memory leak
> > because NFS-Ganesha also consumes huge amount of memory while doing
> > ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
> > usage:
> >
> > ===
> > root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54
> > /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> > /etc/ganesha/ganesha.conf -N NIV_EVENT
> > ===
> >
> > 1.4G is too much for simple stat() :(.
> >
> > Ideas?
>
> nfs-ganesha also has cache layer which can scale to millions of entries
> depending on the number of files/directories being looked upon. However
> there are parameters to tune it. So either try stat with few entries or
> add below block in nfs-ganesha.conf file, set low limits and check the
> difference. That may help us narrow down how much memory actually
> consumed by core nfs-ganesha and gfAPI.
>
> CACHEINODE {
> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no.
> of entries in the cache.
> }
>
> Thanks,
> Soumya
>
> > 24.12.2015 16:32, Oleksandr Natalenko написав:
> >> Still actual issue for 3.7.6. Any suggestions?
> >>
> >> 24.09.2015 10:14, Oleksandr Natalenko написав:
> >>> In our GlusterFS deployment we've encountered something like memory
> >>> leak in GlusterFS FUSE client.
> >>>
> >>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
> >>> maildir format). Here is inode stats for both bricks and mountpoint:
> >>>
> >>> ===
> >>> Brick 1 (Server 1):
> >>>
> >>> Filesystem Inodes IUsed
> >>>
> >>> IFree IUse% Mounted on
> >>>
> >>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 10954918
> >>>
> >>> 567813226 2% /bricks/r6sdLV08_vd1_mail
> >>>
> >>> Brick 2 (Server 2):
> >>>
> >>> Filesystem Inodes IUsed
> >>>
> >>> IFree IUse% Mounted on
> >>>
> >>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 10954913
> >>>
> >>> 567813071 2% /bricks/r6sdLV07_vd0_mail
> >>>
> >>> Mountpoint (Server 3):
> >>>
> >>> Filesystem Inodes IUsed IFree
> >>> IUse% Mounted on
> >>> glusterfs.xxx:mail 578767760 10954915 567812845
> >>> 2% /var/spool/mail/virtual
> >>> ===
> >>>
> >>> glusterfs.xxx domain has two A records for both Server 1 and Server 2.
> >>>
> >>> Here is volume info:
> >>>
> >>> ===
> >>> Volume Name: mail
> >>> Type: Replicate
> >>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
> >>> Status: Started
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>> Options Reconfigured:
> >>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
> >>> features.cache-invalidation-timeout: 10
> >>> performance.stat-prefetch: off
> >>> performance.quick-read: on
> >>> performance.read-ahead: off
> >>> performance.flush-behind: on
> >>> performance.write-behind: on
> >>> performance.io-thread-count: 4
> >>> performance.cache-max-file-size: 1048576
> >>> performance.cache-size: 67108864
> >>> performance.readdir-ahead: off
> >>> ===
> >>>
> >>> Soon enough after mounting and exim/dovecot start, glusterfs client
> >>> process begins to consume huge amount of RAM:
> >>>
> >>> ===
> >>> user at server3 ~$ ps aux | grep glusterfs | grep mail
> >>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05
> >>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
> >>> --volfile-server=glusterfs.xxx --volfile-id=mail
> >>> /var/spool/mail/virtual
> >>> ===
> >>>
> >>> That is, ~15 GiB of RAM.
> >>>
> >>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
> >>> GiB of RAM, and soon after starting mail daemons got OOM killer for
> >>> glusterfs client process.
> >>>
> >>> Mounting same share via NFS works just fine. Also, we have much less
> >>> iowait and loadavg on client side with NFS.
> >>>
> >>> Also, we've tried to change IO threads count and cache size in order
> >>> to limit memory usage with no luck. As you can see, total cache size
> >>> is 4×64==256 MiB (compare to 15 GiB).
> >>>
> >>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
> >>> help as well.
> >>>
> >>> Here are volume memory stats:
> >>>
> >>> ===
> >>> Memory status for volume : mail
> >>> ----------------------------------------------
> >>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>> Mallinfo
> >>> --------
> >>> Arena : 36859904
> >>> Ordblks : 10357
> >>> Smblks : 519
> >>> Hblks : 21
> >>> Hblkhd : 30515200
> >>> Usmblks : 0
> >>> Fsmblks : 53440
> >>> Uordblks : 18604144
> >>> Fordblks : 18255760
> >>> Keepcost : 114112
> >>>
> >>> Mempool Stats
> >>> -------------
> >>> Name HotCount ColdCount PaddedSizeof
> >>> AllocCount MaxAlloc Misses Max-StdAlloc
> >>> ---- -------- --------- ------------
> >>> ---------- -------- -------- ------------
> >>> mail-server:fd_t 0 1024 108
> >>> 30773120 137 0 0
> >>> mail-server:dentry_t 16110 274 84
> >>> 235676148 16384 1106499 1152
> >>> mail-server:inode_t 16363 21 156
> >>> 237216876 16384 1876651 1169
> >>> mail-trash:fd_t 0 1024 108
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-trash:dentry_t 0 32768 84
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-trash:inode_t 4 32764 156
> >>>
> >>> 4 4 0 0
> >>>
> >>> mail-trash:trash_local_t 0 64 8628
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-changetimerecorder:gf_ctr_local_t 0 64
> >>> 16540 0 0 0 0
> >>> mail-changelog:rpcsvc_request_t 0 8 2828
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-changelog:changelog_local_t 0 64 116
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-bitrot-stub:br_stub_local_t 0 512 84
> >>> 79204 4 0 0
> >>> mail-locks:pl_local_t 0 32 148
> >>> 6812757 4 0 0
> >>> mail-upcall:upcall_local_t 0 512 108
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-marker:marker_local_t 0 128 332
> >>> 64980 3 0 0
> >>> mail-quota:quota_local_t 0 64 476
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-server:rpcsvc_request_t 0 512 2828
> >>> 45462533 34 0 0
> >>> glusterfs:struct saved_frame 0 8 124
> >>>
> >>> 2 2 0 0
> >>>
> >>> glusterfs:struct rpc_req 0 8 588
> >>>
> >>> 2 2 0 0
> >>>
> >>> glusterfs:rpcsvc_request_t 1 7 2828
> >>>
> >>> 2 1 0 0
> >>>
> >>> glusterfs:log_buf_t 5 251 140
> >>> 3452 6 0 0
> >>> glusterfs:data_t 242 16141 52
> >>> 480115498 664 0 0
> >>> glusterfs:data_pair_t 230 16153 68
> >>> 179483528 275 0 0
> >>> glusterfs:dict_t 23 4073 140
> >>> 303751675 627 0 0
> >>> glusterfs:call_stub_t 0 1024 3764
> >>> 45290655 34 0 0
> >>> glusterfs:call_stack_t 1 1023 1708
> >>> 43598469 34 0 0
> >>> glusterfs:call_frame_t 1 4095 172
> >>> 336219655 184 0 0
> >>> ----------------------------------------------
> >>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>> Mallinfo
> >>> --------
> >>> Arena : 38174720
> >>> Ordblks : 9041
> >>> Smblks : 507
> >>> Hblks : 21
> >>> Hblkhd : 30515200
> >>> Usmblks : 0
> >>> Fsmblks : 51712
> >>> Uordblks : 19415008
> >>> Fordblks : 18759712
> >>> Keepcost : 114848
> >>>
> >>> Mempool Stats
> >>> -------------
> >>> Name HotCount ColdCount PaddedSizeof
> >>> AllocCount MaxAlloc Misses Max-StdAlloc
> >>> ---- -------- --------- ------------
> >>> ---------- -------- -------- ------------
> >>> mail-server:fd_t 0 1024 108
> >>> 2373075 133 0 0
> >>> mail-server:dentry_t 14114 2270 84
> >>> 3513654 16384 2300 267
> >>> mail-server:inode_t 16374 10 156
> >>> 6766642 16384 194635 1279
> >>> mail-trash:fd_t 0 1024 108
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-trash:dentry_t 0 32768 84
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-trash:inode_t 4 32764 156
> >>>
> >>> 4 4 0 0
> >>>
> >>> mail-trash:trash_local_t 0 64 8628
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-changetimerecorder:gf_ctr_local_t 0 64
> >>> 16540 0 0 0 0
> >>> mail-changelog:rpcsvc_request_t 0 8 2828
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-changelog:changelog_local_t 0 64 116
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-bitrot-stub:br_stub_local_t 0 512 84
> >>> 71354 4 0 0
> >>> mail-locks:pl_local_t 0 32 148
> >>> 8135032 4 0 0
> >>> mail-upcall:upcall_local_t 0 512 108
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-marker:marker_local_t 0 128 332
> >>> 65005 3 0 0
> >>> mail-quota:quota_local_t 0 64 476
> >>>
> >>> 0 0 0 0
> >>>
> >>> mail-server:rpcsvc_request_t 0 512 2828
> >>> 12882393 30 0 0
> >>> glusterfs:struct saved_frame 0 8 124
> >>>
> >>> 2 2 0 0
> >>>
> >>> glusterfs:struct rpc_req 0 8 588
> >>>
> >>> 2 2 0 0
> >>>
> >>> glusterfs:rpcsvc_request_t 1 7 2828
> >>>
> >>> 2 1 0 0
> >>>
> >>> glusterfs:log_buf_t 5 251 140
> >>> 3443 6 0 0
> >>> glusterfs:data_t 242 16141 52
> >>> 138743429 290 0 0
> >>> glusterfs:data_pair_t 230 16153 68
> >>> 126649864 270 0 0
> >>> glusterfs:dict_t 23 4073 140
> >>> 20356289 63 0 0
> >>> glusterfs:call_stub_t 0 1024 3764
> >>> 13678560 31 0 0
> >>> glusterfs:call_stack_t 1 1023 1708
> >>> 11011561 30 0 0
> >>> glusterfs:call_frame_t 1 4095 172
> >>> 125764190 193 0 0
> >>> ----------------------------------------------
> >>> ===
> >>>
> >>> So, my questions are:
> >>>
> >>> 1) what one should do to limit GlusterFS FUSE client memory usage?
> >>> 2) what one should do to prevent client high loadavg because of high
> >>> iowait because of multiple concurrent volume users?
> >>>
> >>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3,
> >>> GlusterFS client version is 3.7.4.
> >>>
> >>> Any additional info needed?
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list