<div dir="ltr"><div>Hi! <br></div><div><br></div><div>I opened a Github issue <a href="https://github.com/gluster/glusterfs/issues/3206">https://github.com/gluster/glusterfs/issues/3206</a> but not sure how much attention they get there, so re-posting here just in case someone has any ideas. <br></div><div><br></div><div>Description of problem:<br><br>GlusterFS 9.5, 3-node cluster (2 bricks + arbiter), an attempt to tar the whole filesystem (35-40 GB, 1.6 million files) on a client succeeds but causes the glusterfs fuse mount process to consume 0.5+ GB of RAM. The usage never goes down after tar exits.<br><br>The exact command to reproduce the issue:<br><br>/usr/bin/tar --use-compress-program="/bin/pigz" -cf /path/to/archive.tar.gz --warning=no-file-changed /glusterfsmount<br></div><div><br></div><div>The output of the gluster volume info command:</div><div><br>Volume Name: gvol1<br>Type: Replicate<br>Volume ID: 0292ac43-89bd-45a4-b91d-799b49613e60<br>Status: Started<br>Snapshot Count: 0<br>Number of Bricks: 1 x (2 + 1) = 3<br>Transport-type: tcp<br>Bricks:<br>Brick1: 192.168.0.31:/gluster/brick1/gvol1<br>Brick2: 192.168.0.32:/gluster/brick1/gvol1<br>Brick3: 192.168.0.5:/gluster/brick1/gvol1 (arbiter)<br>Options Reconfigured:<br>performance.open-behind: off<br>cluster.readdir-optimize: off<br>cluster.consistent-metadata: on<br>features.cache-invalidation: on<br>diagnostics.count-fop-hits: on<br>diagnostics.latency-measurement: on<br>storage.fips-mode-rchecksum: on<br>performance.cache-size: 256MB<br>client.event-threads: 8<br>server.event-threads: 4<br>storage.reserve: 1<br>performance.cache-invalidation: on<br>cluster.lookup-optimize: on<br>transport.address-family: inet<br>nfs.disable: on<br>performance.client-io-threads: on<br>features.cache-invalidation-timeout: 600<br>performance.md-cache-timeout: 600<br>network.inode-lru-limit: 50000<br>cluster.shd-max-threads: 4<br>cluster.self-heal-window-size: 8<br>performance.enable-least-priority: off<br>performance.cache-max-file-size: 2MB<br><br>The output of the gluster volume status command:<br><br>Status of volume: gvol1<br>Gluster process TCP Port RDMA Port Online Pid<br>------------------------------------------------------------------------------<br>Brick 192.168.0.31:/gluster/brick1/gvol1 49152 0 Y 1767<br>Brick 192.168.0.32:/gluster/brick1/gvol1 49152 0 Y 1696<br>Brick 192.168.0.5:/gluster/brick1/gvol1 49152 0 Y 1318<br>Self-heal Daemon on localhost N/A N/A Y 1329<br>Self-heal Daemon on 192.168.0.31 N/A N/A Y 1778<br>Self-heal Daemon on 192.168.0.32 N/A N/A Y 1707<br><br>Task Status of Volume gvol1<br>------------------------------------------------------------------------------<br>There are no active volume tasks<br><br>The output of the gluster volume heal command:<br><br>Brick 192.168.0.31:/gluster/brick1/gvol1<br>Status: Connected<br>Number of entries: 0<br><br>Brick 192.168.0.32:/gluster/brick1/gvol1<br>Status: Connected<br>Number of entries: 0<br><br>Brick 192.168.0.5:/gluster/brick1/gvol1<br>Status: Connected<br>Number of entries: 0<br></div><div><br></div><div>
The operating system / glusterfs version:<br><br>CentOS Linux release 7.9.2009 (Core), fully up to date<br>glusterfs 9.5<br>kernel 3.10.0-1160.53.1.el7.x86_64
</div><div><br>The logs are basically empty since the last mount except for the mount-related messages.<br></div><div><br></div><div>Additional info: a statedump from the client is attached to the Github issue, <a href="https://github.com/gluster/glusterfs/files/8004792/glusterdump.18906.dump.1643991007.gz">https://github.com/gluster/glusterfs/files/8004792/glusterdump.18906.dump.1643991007.gz</a>, in case someone wants to have a look.<br><br>There was also an issue with other clients, running PHP applications with lots of small files, where glusterfs fuse mount process would very quickly balloon to ~2 GB over the course of 24 hours and its performance would slow to a crawl. This happened very consistently with glusterfs 8.x and 9.5, I managed to resolve it at least partially with disabling performance.open-behind: the memory usage either remains consistent or increases at a much slower rate, which is acceptable for this use case.<br><br>Now the issue remains on this single client, which doesn't do much other than reading and archiving all files from the gluster volume once per day. The glusterfs fuse mount process balloons to 0.5+ GB during the first tar run and remains more or less consistent afterwards, including subsequent tar runs.</div><div><br></div><div>I would very much appreciate any advice or suggestions. <br></div><div><br></div><div>Best regards, <br></div><div>Zakhar<br></div></div>