[Gluster-users] Heavy performance impact to local access (glusterfs 3.6.7)
Joerg Hinz
Hinz at Linux-Systeme.de
Mon Dec 14 15:02:41 UTC 2015
I have a setup with 2 GlusterFS 3.6.7 servers that are connected with a
WAN-connection:
root at r1:/daten_gluster# gluster pool list
UUID Hostname State
6b70b66c-866f-4222-826b-736a21a9fce1 willy Connected
f1ba0eb9-b991-4c99-a177-a4ca7764ff52 localhost Connected
PING willy (10.8.0.186) 56(84) bytes of data.
64 bytes from willy (10.8.0.186): icmp_seq=1 ttl=63 time=181 ms
64 bytes from willy (10.8.0.186): icmp_seq=2 ttl=63 time=69.0 ms
64 bytes from willy (10.8.0.186): icmp_seq=3 ttl=63 time=72.1 ms
64 bytes from willy (10.8.0.186): icmp_seq=4 ttl=63 time=71.1 ms
64 bytes from willy (10.8.0.186): icmp_seq=5 ttl=63 time=70.2 ms
As you see it's a typical WAN-connect with a latency of about 70ms.
And one shared gluster volume:
root at r1:/daten_gluster# gluster volume info
Volume Name: gv0
Type: Distribute
Volume ID: 5baeef5e-4fd4-472f-b313-b0fcd1baa17a
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: r1:/storage/gluster
Brick2: willy:/storage/gluster
Options Reconfigured:
nfs.export-volumes: off
cluster.readdir-optimize: on
performance.readdir-ahead: on
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
performance.write-behind-window-size: 64MB
performance.cache-size: 256MB
performance.client-io-threads: on
performance.cache-refresh-timeout: 10
nfs.addr-namelookup: off
cluster.min-free-disk: 1
cluster.data-self-heal-algorithm: full
performance.io-thread-count: 64
nfs.disable: true
performance.flush-behind: on
As you see I tried all performance-options I found that might be useful.
The problem is, when working in the gluster-mounted directory (using -t
glusterfs, I tried NFS too, but there was not that great performance win),
_EVERYTHING_ is DEAD SLOW:
root at r1:/daten_gluster# time find test
test
test/4
test/4/file05
test/4/file04
test/4/file02
test/4/file03
test/4/file01
test/2
test/2/file05
test/2/file04
test/2/file02
test/2/file03
test/2/file01
test/file05
test/3
test/3/file05
test/3/file04
test/3/file02
test/3/file03
test/3/file01
test/file04
test/file02
test/file03
test/1
test/1/file05
test/1/file04
test/1/file02
test/1/file03
test/1/file01
test/file01
real 0m4.734s
user 0m0.000s
sys 0m0.000s
When I disconnect the other node (willy):
root at r1:/daten_gluster# gluster volume remove-brick gv0
willy:/storage/gluster force
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit force: success
root at r1:/daten_gluster# time find test
test
test/4
test/4/file05
test/4/file04
test/4/file02
test/4/file03
test/4/file01
test/2
test/2/file05
test/2/file04
test/2/file02
test/2/file03
test/2/file01
test/file05
test/3
test/3/file05
test/3/file04
test/3/file02
test/3/file03
test/3/file01
test/file04
test/file02
test/file03
test/1
test/1/file05
test/1/file04
test/1/file02
test/1/file03
test/1/file01
test/file01
real 0m0.017s
user 0m0.000s
sys 0m0.000s
5 secs compared to 0,02 secs....
WHY?
I'm just running local reads... (not even writes that might be distributed)
When I add willy again it again slows down to death:
root at r1:/daten_gluster# gluster volume add-brick gv0
willy:/storage/gluster force
volume add-brick: success
root at r1:/daten_gluster# time find test
test
test/4
test/4/file05
test/4/file04
test/4/file02
test/4/file03
test/4/file01
test/2
test/2/file05
test/2/file04
test/2/file02
test/2/file03
test/2/file01
test/file05
test/3
test/3/file05
test/3/file04
test/3/file02
test/3/file03
test/3/file01
test/file04
test/file02
test/file03
test/1
test/1/file05
test/1/file04
test/1/file02
test/1/file03
test/1/file01
test/file01
real 0m5.226s
user 0m0.000s
sys 0m0.000s
These were only 30 files.
I wanted to share 220.000 files with glusterfs...
Where is my configuration-mistake?
Can you please help me or give me a hint?
I cannot belive that GlusterFS is that problematic on WAN-connects...?
Thank you very much!
Joerg
More information about the Gluster-users
mailing list