[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

Sun Feb 4 07:20:06 UTC 2018


I have been working on setting up a 4 replica gluster with over a million
files (~250GB total), and I've seen some really weird stuff happen, even
after trying to optimize for small files. I've set up a 4-brick replicate
volume (gluster 3.13.2).

It took almost 2 days to rsync the data from the local drive to the gluster
volume, and now I'm running a 2nd rsync that just looks for changes in case
more files have been written. I'd like to concentrate this email on a very
specific and odd issue.

The dir structure is
                 10k+files in each month folder

rsyncing each month folder cold can take 2+ minutes.

However, if I ls the destination folder first, or use find (both of which
finish within 5 seconds), the rsync is almost instant.

Here's a log with time calls that shows you what happens.:

box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/08/ 08/
sending incremental file list
^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at
rsync.c(637) [sender=3.1.0]

real    1m39.848s
user    0m0.010s
sys     0m0.030s
box:/mnt/gluster/uploads/2017 # time find 08 | wc -l

real    0m0.726s
user    0m0.013s
sys     0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/08/ 08/
sending incremental file list

real    0m0.562s
user    0m0.057s
sys     0m0.137s
box:/mnt/gluster/uploads/2017 # time find 07 | wc -l

real    0m4.550s
user    0m0.010s
sys     0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/07/ 07/
sending incremental file list

real    0m0.428s
user    0m0.030s
sys     0m0.083s
box:/mnt/gluster/uploads/2017 # time ls 06 | wc -l

real    0m1.850s
user    0m0.077s
sys     0m0.040s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/06/ 06/
sending incremental file list

real    0m0.627s
user    0m0.073s
sys     0m0.107s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/05/ 05/
sending incremental file list

real    2m24.382s
user    0m0.127s
sys     0m0.357s

Note how if I precede the rsync call with ls or find, the rsync completes
in less than a second (finding no files to sync because they've already
been synced). Otherwise, it takes over 2 minutes (I interrupted the first
call before the 2 minutes because it was already taking too long).

What could be causing rsync to work so slowly unless the dir is primed?

Volume config:

Volume Name: gluster
Type: Replicate
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Brick1: server1 :/mnt/server1_block4/gluster
Brick2: server2 :/mnt/server2_block4/gluster
Brick3: server3 :/mnt/server3_block4/gluster
Brick4: server4 :/mnt/server4_block4/gluster
Options Reconfigured:
performance.parallel-readdir: off
transport.address-family: inet
nfs.disable: on
cluster.self-heal-daemon: enable
performance.cache-size: 1GB
network.ping-timeout: 5
cluster.quorum-type: fixed
cluster.quorum-count: 1
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 500000
performance.rda-cache-limit: 256MB
performance.read-ahead: off
client.event-threads: 4
server.event-threads: 4

Thank you for any insight.

