[Gluster-users] poor performance

Wed Dec 14 11:28:02 UTC 2022

Hi All,

We've got a glusterfs cluster that houses some php web sites.

This is generally considered a bad idea and we can see why.

With performance.nl-cache on it actually turns out to be very 
reasonable, however, with this turned of performance is roughly 5x 
worse.  meaning a request that would take sub 500ms now takes 2500ms.  
In other cases we see far, far worse cases, eg, with nl-cache takes 
~1500ms, without takes ~30s (20x worse).

So why not use nl-cache?  Well, it results in readdir reporting files 
which then fails to open with ENOENT.  The cache also never clears even 
though the configuration says nl-cache entries should only be cached for 
60s.  Even for "ls -lah" in affected folders you'll notice ???? mark 
entries for attributes on files.  If this recovers in a reasonable time 
(say, a few seconds, sure).

# gluster volume info
Type: Replicate
Volume ID: cbe08331-8b83-41ac-b56d-88ef30c0f5c7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Options Reconfigured:
performance.nl-cache: on
cluster.readdir-optimize: on
config.client-threads: 2
config.brick-threads: 4
config.global-threading: on
performance.iot-pass-through: on
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: enable
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
client.event-threads: 2
server.event-threads: 2
transport.address-family: inet
nfs.disable: on
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
cluster.data-self-heal: off
cluster.self-heal-daemon: on
server.allow-insecure: on
features.ctime: off
performance.io-cache: on
performance.cache-invalidation: on
features.cache-invalidation: on
performance.qr-cache-timeout: 600
features.cache-invalidation-timeout: 600
performance.io-cache-size: 128MB
performance.cache-size: 128MB

Are there any other recommendations short of abandon all hope of 
redundancy and to revert to a single-server setup (for the web code at 
least).  Currently the cost of the redundancy seems to outweigh the benefit.

Glusterfs version 10.2.  With patch for --inode-table-size, mounts 
happen with:

/usr/sbin/glusterfs --acl --reader-thread-count=2 --lru-limit=524288 
--inode-table-size=524288 --invalidate-limit=16 --background-qlen=32 
--fuse-mountopts=nodev,nosuid,noexec,noatime --process-name fuse 
--volfile-server=127.0.0.1 --volfile-id=gv_home 
--fuse-mountopts=nodev,nosuid,noexec,noatime /home

Kind Regards,
Jaco