[Gluster-users] Global threading

Fri Mar 5 15:47:15 UTC 2021

Some time ago I created a replica 3 volume using gluster 8.3
with the following topology for the time being:

server1/brick1 ----\                          /---- server3/brick3
                    \____ ADSL 10/1 Mbits ___/
                    /     <- down   up ->    \
server2/brick2 ----/                          \---- old storage

The connection between the two boxes at each end is 1Gbit.
The distance between the two sides is about 4000 km and
roughly 250ms.

For the past one and a half month I have been running one rsync
on each of the three servers to fetch different parts of a
mail store from "old storage". The mail store consists of
about 1.1 million predominantly small files very unevenly
spread over 6600 directories. Some directories contain 30000+
files, the worst one has 90000+.

Copying simultaneously to all three servers wastes traffic
(what is rsynced to server1 and server2 has to travel down
from old storage and then back up again to server3), but
uses the available bandwidth more efficiently (by using
both directions instead of only down, as the case would be
if I only rsynced to server3 and let the replication flow
down to servers 1 and 2). I did this because, as I mentioned
earlier in the thread "Replication logic", I cannot saturate
any of CPU, disk I/O or even the meager network. This way
the waste of traffic increases the overall speed of copying.
Diagnostics showed that FSYNC had by far the greatest average
latency, followed by MKDIR and CREATE, but they all had
relatively few calls. LOOKUP is what has a huge number of
calls so, even with a moderate average latency, it accounts
for the greatest overall delay, followed by INODELK.

I tested writing both to glusterfs and nfs-ganesha, but
didn't notice any difference between them in speed (however,
nfs-ganesha used seven times more memory than glusterfsd).
Tweaking threads, write-behind, parallel-readdir, cache-size
and inode-lru-limit didn't produce any noticeable difference
either.

Then a few days ago I noticed global-threading at
https://github.com/gluster/glusterfs/issues/532 . It
seemed promising but not merged, but it turned out that
it is actually merged. So last night I upgraded to 9.0
and turned it on. I also dumped nfs-ganesha. With that,
my configuration ended up like this:

Volume Name: gv0
Type: Replicate
Volume ID: 2786efab-9178-4a9a-a525-21d6f1c94de9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/gfs/gv0
Brick2: node2:/gfs/gv0
Brick3: node3:/gfs/gv0
Options Reconfigured:
cluster.granular-entry-heal: enable
network.ping-timeout: 20
network.frame-timeout: 60
performance.write-behind: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.bitrot: off
features.scrub: Inactive
features.scrub-freq: weekly
performance.io-thread-count: 32
features.selinux: off
client.event-threads: 3
server.event-threads: 3
cluster.min-free-disk: 1%
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.cache-size: 256MB
network.inode-lru-limit: 131072
performance.parallel-readdir: on
performance.qr-cache-timeout: 600
performance.nl-cache-positive-entry: on
performance.nfs.io-threads: on
config.global-threading: on
performance.iot-pass-through: on

In the short time it's been running since, I saw no
subjectively noticeable increase in the speed of
writing, but I do see some increase in the speed of
file listing (that is, the speed at which rsync
without --whole-file will run through preexisting
files while reporting "file X is uptodate"). This
is presumably stat working faster because of thread
parallelisation, but I'm only guessing. The network
still does not get saturated except during the
transfer of some occasional big (5MB+) files. So
far I have seen no negative impact of turning global
threading on compared to previously.

Any and all ideas on how to improve this setup (other
than physically) are most welcome.