<div dir="ltr">Hello,<div><br></div><div>We have a 3-way replicated Gluster setup where clients are connected through NFS and the clients are also the server. Here we see the Gluster NFS server keeps increasing the RAM usage until eventually the server goes out of memory. We have this on all 3 servers. The server has 96GB RAM total and we&#39;ve seen the Gluster NFS server use op to 70GB RAM and all the swap was 100% in use. If other processes wouldn&#39;t also use the RAM I guess Gluster would claim that as well.</div><div><br></div><div>We are running GlusterFS 3.12.9-1 on Debian 8.</div><div>The process causing the high memory is:</div><div>/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/94e073c0dae2c47025351342ba0ddc44.socket<br></div><div><br></div><div>Gluster volume info:</div><div><br></div><div><div>Volume Name: www</div><div>Type: Replicate</div><div>Volume ID: fbcc21ee-bd0b-40a5-8785-bd00e49e9b72</div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 1 x 3 = 3</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: 10.0.0.3:/storage/sdc1/www</div><div>Brick2: 10.0.0.2:/storage/sdc1/www</div><div>Brick3: 10.0.0.1:/storage/sdc1/www</div><div>Options Reconfigured:</div><div>diagnostics.client-log-level: ERROR</div><div>performance.stat-prefetch: on</div><div>performance.md-cache-timeout: 600</div><div>performance.cache-invalidation: on</div><div>features.cache-invalidation: on</div><div>network.ping-timeout: 3</div><div>transport.address-family: inet</div><div>performance.readdir-ahead: on</div><div>nfs.disable: off</div><div>performance.cache-size: 1GB</div><div>performance.write-behind-window-size: 4MB</div><div>performance.nfs.io-threads: on</div><div>performance.nfs.io-cache: off</div><div>performance.nfs.quick-read: off</div><div>performance.nfs.write-behind-window-size: 4MB</div><div>features.cache-invalidation-timeout: 600</div><div>performance.nfs.stat-prefetch: on</div><div>network.inode-lru-limit: 90000</div><div>performance.cache-priority: *.php:3,*.temp:3,*:1</div><div>cluster.readdir-optimize: on</div><div>performance.nfs.read-ahead: off</div><div>performance.flush-behind: on</div><div>performance.write-behind: on</div><div>performance.nfs.write-behind: on</div><div>performance.nfs.flush-behind: on</div><div>features.bitrot: on</div><div>features.scrub: Active</div><div>performance.quick-read: off</div><div>performance.io-thread-count: 64</div><div>nfs.enable-ino32: on</div><div>nfs.log-level: ERROR</div><div>storage.build-pgfid: off</div><div>diagnostics.brick-log-level: WARNING</div><div>cluster.self-heal-daemon: enable</div></div><div><br></div><div>We don&#39;t see anyting in the logs that looks like it could explain the high memory. We did make a statedump which I&#39;ll post here and which I have also attached as attachment:</div><div><a href="https://pastebin.com/raw/sDNF1wwi">https://pastebin.com/raw/sDNF1wwi</a><br></div><div><br></div><div>Running the command to get the statedump is quite dangerous for us as the USR1 signal appeared to cause Gluster to move swap memory back into RAM and go offline while this is in progress.</div><div>Fwiw we do have vm.swappiness set to 1</div><div><br></div><div>Does anyone have an idea of what could cause this and what we can do to stop such high memory usage?</div><div><br></div><div>Cheers,</div><div>Niels</div></div>