[Gluster-users] High network traffic with performance.readdir-ahead on

Mon Feb 18 16:58:21 UTC 2019

Hello folks,

We are working on a migration from Gluster 3.8 to 5.3. Because it has a
long migration path, we decided to install new servers running version 5.3
and then migrate the clients updating and pointing them to the new cluster.
As a bonus, we still keep a rollback option in case of problems.

We made our first migration attempt today and, unfortunately, we had to
rollback to the old cluster. Since the very few minutes after switching
clients from old to the new cluster, we noticed an unusual network traffic
on glusters servers (around 320mbps) for that time.

Near to 08:05 (our first daily peak is 8AM) we reached near to 1gbps during
some minutes, and the traffic kept sustaining really high (over 800mbps) up
to our second daily peak (at 9AM) when we reached again 1gbps. We decided
to rollback the main production servers to old cluster, and kept some
servers on the new one. We observed the network traffic going down again to
around 300mbps.

Talking with @nbalacha (Thank you again, man!) on IRC channel he suggested
disabling performance.readdir-ahead option and the traffic went instantly
down to near to 10mbps. A graph showing all these events can be found here:
https://pasteboard.co/I1JR7ck.png

So, the first point here, should performance.readdir-ahead be on by
default? Maybe our scenario isn't the best use scenario, because, in fact,
we do have hundreds of thousands of directories and it looks to be causing
much more problems than benefits.

Another thing we noticed is that when we point clients running new gluster
version (5.3) to the old cluster (version 3.8) we also ran into the high
traffic scenario, even already having performance.readdir-ahead switched to
"off" (the default option for this version). You can see these high
traffics on old cluster here: https://pasteboard.co/I1KdTUd.png . We are
aware that having clients and servers running different versions isn't
recommended and we are doing that just for debug/tests purposes.

About our setup, we have ~= 1.5T volume running in Replicated mode (2
servers each cluster). We have around 30 clients mounting these volumes
through fuse.glusterfs.

# gluster volume info of new cluster
Volume Name: X
Type: Replicate
Volume ID: 1d8f7d2d-bda6-4f1c-aa10-6ad29e0b7f5e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs02tmp.x.net:/var/data/glusterfs/x/brick
Brick2: fs01tmp.x.net:/var/data/glusterfs/x/brick
Options Reconfigured:
performance.readdir-ahead: off
client.event-threads: 4
server.event-threads: 4
server.allow-insecure: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.io-thread-count: 32
performance.cache-size: 1900MB
performance.write-behind-window-size: 16MB
performance.flush-behind: on
network.ping-timeout: 10

# gluster volume info of old cluster
Volume Name: X
Type: Replicate
Volume ID: 1bd3b5d8-b10f-4c4b-a28a-06ea4cfa1d89
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs1.x.net:/var/local/gfs
Brick2: fs2.x.net:/var/local/gfs
Options Reconfigured:
network.ping-timeout: 10
performance.cache-size: 512MB
server.allow-insecure: on
client.bind-insecure: on

I was able to collect a profile from new gluster and pasted here:
https://pastebin.com/ffF8RVH4 . The sad part is that I was unable to
reproduce the issue after reenabling performance.readdir-ahead after. Not
sure if the clients connected to the cluster were unable to create a
workload near to the one that we had this morning. We'll try to recreate
the condition that we had soon.

I can provide more info and tests if you guys need it.

Cheers,
Alberto Bengoa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190218/2cac8cfd/attachment.html>