[Gluster-users] Heavy performance impact to local access (glusterfs 3.6.7)

Mon Dec 14 15:02:41 UTC 2015

I have a setup with 2 GlusterFS 3.6.7 servers that are connected with a 
WAN-connection:

root at r1:/daten_gluster# gluster pool list
UUID Hostname State
6b70b66c-866f-4222-826b-736a21a9fce1 willy Connected
f1ba0eb9-b991-4c99-a177-a4ca7764ff52 localhost Connected

PING willy (10.8.0.186) 56(84) bytes of data.
64 bytes from willy (10.8.0.186): icmp_seq=1 ttl=63 time=181 ms
64 bytes from willy (10.8.0.186): icmp_seq=2 ttl=63 time=69.0 ms
64 bytes from willy (10.8.0.186): icmp_seq=3 ttl=63 time=72.1 ms
64 bytes from willy (10.8.0.186): icmp_seq=4 ttl=63 time=71.1 ms
64 bytes from willy (10.8.0.186): icmp_seq=5 ttl=63 time=70.2 ms

As you see it's a typical WAN-connect with a latency of about 70ms.

And one shared gluster volume:

root at r1:/daten_gluster# gluster volume info

Volume Name: gv0
Type: Distribute
Volume ID: 5baeef5e-4fd4-472f-b313-b0fcd1baa17a
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: r1:/storage/gluster
Brick2: willy:/storage/gluster
Options Reconfigured:
nfs.export-volumes: off
cluster.readdir-optimize: on
performance.readdir-ahead: on
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
performance.write-behind-window-size: 64MB
performance.cache-size: 256MB
performance.client-io-threads: on
performance.cache-refresh-timeout: 10
nfs.addr-namelookup: off
cluster.min-free-disk: 1
cluster.data-self-heal-algorithm: full
performance.io-thread-count: 64
nfs.disable: true
performance.flush-behind: on

As you see I tried all performance-options I found that might be useful.

The problem is, when working in the gluster-mounted directory (using -t 
glusterfs, I tried NFS too, but there was not that great performance win),

_EVERYTHING_ is DEAD SLOW:

root at r1:/daten_gluster# time find test
test
test/4
test/4/file05
test/4/file04
test/4/file02
test/4/file03
test/4/file01
test/2
test/2/file05
test/2/file04
test/2/file02
test/2/file03
test/2/file01
test/file05
test/3
test/3/file05
test/3/file04
test/3/file02
test/3/file03
test/3/file01
test/file04
test/file02
test/file03
test/1
test/1/file05
test/1/file04
test/1/file02
test/1/file03
test/1/file01
test/file01

real    0m4.734s
user    0m0.000s
sys     0m0.000s

When I disconnect the other node (willy):

root at r1:/daten_gluster# gluster volume remove-brick gv0 
willy:/storage/gluster force
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit force: success

root at r1:/daten_gluster# time find test
test
test/4
test/4/file05
test/4/file04
test/4/file02
test/4/file03
test/4/file01
test/2
test/2/file05
test/2/file04
test/2/file02
test/2/file03
test/2/file01
test/file05
test/3
test/3/file05
test/3/file04
test/3/file02
test/3/file03
test/3/file01
test/file04
test/file02
test/file03
test/1
test/1/file05
test/1/file04
test/1/file02
test/1/file03
test/1/file01
test/file01

real    0m0.017s
user    0m0.000s
sys     0m0.000s

5 secs compared to 0,02 secs....

WHY?

I'm just running local reads... (not even writes that might be distributed)

When I add willy again it again slows down to death:

root at r1:/daten_gluster# gluster volume add-brick gv0 
willy:/storage/gluster force
volume add-brick: success
root at r1:/daten_gluster# time find test
test
test/4
test/4/file05
test/4/file04
test/4/file02
test/4/file03
test/4/file01
test/2
test/2/file05
test/2/file04
test/2/file02
test/2/file03
test/2/file01
test/file05
test/3
test/3/file05
test/3/file04
test/3/file02
test/3/file03
test/3/file01
test/file04
test/file02
test/file03
test/1
test/1/file05
test/1/file04
test/1/file02
test/1/file03
test/1/file01
test/file01

real    0m5.226s
user    0m0.000s
sys     0m0.000s

These were only 30 files.

I wanted to share 220.000 files with glusterfs...

Where is my configuration-mistake?

Can you please help me or give me a hint?

I cannot belive that GlusterFS is that problematic on WAN-connects...?

Thank you very much!

Joerg