[Gluster-users] GlusterFS 9.5 fuse mount excessive memory usage

Sat Feb 5 05:54:03 UTC 2022

Hi!

I opened a Github issue https://github.com/gluster/glusterfs/issues/3206
but not sure how much attention they get there, so re-posting here just in
case someone has any ideas.

Description of problem:

GlusterFS 9.5, 3-node cluster (2 bricks + arbiter), an attempt to tar the
whole filesystem (35-40 GB, 1.6 million files) on a client succeeds but
causes the glusterfs fuse mount process to consume 0.5+ GB of RAM. The
usage never goes down after tar exits.

The exact command to reproduce the issue:

/usr/bin/tar --use-compress-program="/bin/pigz" -cf /path/to/archive.tar.gz
--warning=no-file-changed /glusterfsmount

The output of the gluster volume info command:

Volume Name: gvol1
Type: Replicate
Volume ID: 0292ac43-89bd-45a4-b91d-799b49613e60
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.0.31:/gluster/brick1/gvol1
Brick2: 192.168.0.32:/gluster/brick1/gvol1
Brick3: 192.168.0.5:/gluster/brick1/gvol1 (arbiter)
Options Reconfigured:
performance.open-behind: off
cluster.readdir-optimize: off
cluster.consistent-metadata: on
features.cache-invalidation: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
storage.fips-mode-rchecksum: on
performance.cache-size: 256MB
client.event-threads: 8
server.event-threads: 4
storage.reserve: 1
performance.cache-invalidation: on
cluster.lookup-optimize: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
features.cache-invalidation-timeout: 600
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
cluster.shd-max-threads: 4
cluster.self-heal-window-size: 8
performance.enable-least-priority: off
performance.cache-max-file-size: 2MB

The output of the gluster volume status command:

Status of volume: gvol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.0.31:/gluster/brick1/gvol1    49152     0          Y
1767
Brick 192.168.0.32:/gluster/brick1/gvol1    49152     0          Y
1696
Brick 192.168.0.5:/gluster/brick1/gvol1     49152     0          Y
1318
Self-heal Daemon on localhost               N/A       N/A        Y
1329
Self-heal Daemon on 192.168.0.31            N/A       N/A        Y
1778
Self-heal Daemon on 192.168.0.32            N/A       N/A        Y
1707

Task Status of Volume gvol1
------------------------------------------------------------------------------
There are no active volume tasks

The output of the gluster volume heal command:

Brick 192.168.0.31:/gluster/brick1/gvol1
Status: Connected
Number of entries: 0

Brick 192.168.0.32:/gluster/brick1/gvol1
Status: Connected
Number of entries: 0

Brick 192.168.0.5:/gluster/brick1/gvol1
Status: Connected
Number of entries: 0

The operating system / glusterfs version:

CentOS Linux release 7.9.2009 (Core), fully up to date
glusterfs 9.5
kernel 3.10.0-1160.53.1.el7.x86_64

The logs are basically empty since the last mount except for the
mount-related messages.

Additional info: a statedump from the client is attached to the Github
issue,
https://github.com/gluster/glusterfs/files/8004792/glusterdump.18906.dump.1643991007.gz,
in case someone wants to have a look.

There was also an issue with other clients, running PHP applications with
lots of small files, where glusterfs fuse mount process would very quickly
balloon to ~2 GB over the course of 24 hours and its performance would slow
to a crawl. This happened very consistently with glusterfs 8.x and 9.5, I
managed to resolve it at least partially with disabling
performance.open-behind: the memory usage either remains consistent or
increases at a much slower rate, which is acceptable for this use case.

Now the issue remains on this single client, which doesn't do much other
than reading and archiving all files from the gluster volume once per day.
The glusterfs fuse mount process balloons to 0.5+ GB during the first tar
run and remains more or less consistent afterwards, including subsequent
tar runs.

I would very much appreciate any advice or suggestions.

Best regards,
Zakhar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220205/0ce106e7/attachment.html>