[Gluster-devel] dbench crashes with client-side perf translators
Jake Maul
jakemaul at gmail.com
Tue Oct 14 22:58:09 UTC 2008
Greetings,
I'm working on implementing GlusterFS as a replacement for NFS, and
have ran into a strange hiccup with client-side performance
translators when benchmarking with dbench
(http://samba.org/ftp/tridge/dbench/). Below is the output. Note that
it runs normally for ~80 seconds, then seems to stop responding
(latency goes up by ~1s every second). Eventually around 50 seconds
after that, it gives up.
This happens with a 2-server AFR setup or a single backend. With a
simple setup (client/io-threads on the client,
posix/posix-locks/io-threads/server on the server), it works. If I
enable any of iocache, readahead, or writeback on the client side,
dbench will crash if the concurrency is very high ('50' breaks it for
me... possibly lower, haven't tested exhaustively).
IOzone is perfectly happy with all 3 running. I've not tested with any
other FUSE filesystem. Honestly I can't say where the problem might
be... dbench, FUSE, or GlusterFS. I've not tested any version but
1.3.12a of GlusterFS, running the stock CentOS 5.2 fuse and the
GlusterFS patched fuse. Any ideas?
Lots of troubleshooting info below.... Note it's supposed to run for
120sec 'warmup' and 600sec 'execute' ... 12 minutes total. Problems
generally occur well before the half-way mark.
Thanks,
Jake
CentOS release 5.2 (Final)
Linux testbox2.localdomain 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24
19:33:52 EDT 2008 i686 i686 i386 GNU/Linux
glusterfs 1.3.12a built on Oct 9 2008 17:04:22
Repository revision: glusterfs--mainline--2.5--patch-799
glusterfs on /testbox3 type fuse
(rw,nosuid,nodev,allow_other,default_permissions,max_read=1048576)
=================
Test run on an AFR setup with all 3 enabled:
=================
[root at testbox1 dbench]# /usr/src/dbench-4.0/dbench -c
/usr/src/dbench-4.0/client.txt 50
<snip>
50 940 14.05 MB/sec warmup 78 sec latency 373.806 ms
50 957 13.93 MB/sec warmup 79 sec latency 365.397 ms
50 973 13.82 MB/sec warmup 80 sec latency 590.412 ms
50 985 13.71 MB/sec warmup 81 sec latency 536.578 ms
50 989 13.57 MB/sec warmup 82 sec latency 1025.981 ms
50 994 13.45 MB/sec warmup 83 sec latency 1406.175 ms
50 994 13.29 MB/sec warmup 84 sec latency 1896.278 ms
50 994 13.14 MB/sec warmup 85 sec latency 2897.077 ms
50 994 12.98 MB/sec warmup 86 sec latency 3899.889 ms
50 994 12.83 MB/sec warmup 87 sec latency 4902.711 ms
<snip>
50 994 0.00 MB/sec execute 8 sec latency 46970.466 ms
50 994 0.00 MB/sec execute 9 sec latency 47972.272 ms
50 994 0.00 MB/sec execute 10 sec latency 48974.077 ms
50 994 0.00 MB/sec execute 11 sec latency 49975.883 ms
[1001] read failed on handle 10087 (No such file or directory)
[978] read failed on handle 10081 (No such file or directory)
[1002] read failed on handle 10087 (No such file or directory)
[971] open ./clients/client27/~dmtmp/PWRPNT/NEWTIPS.PPT failed for
handle 10080 (Transport endpoint is not connected)
(972) ERROR: handle 10080 was not found
[971] open ./clients/client28/~dmtmp/PWRPNT/NEWTIPS.PPT failed for
handle 10080 (Transport endpoint is not connected)
(972) ERROR: handle 10080 was not found
[1008] write failed on handle 10087 (Transport endpoint is not connected)
[971] open ./clients/client19/~dmtmp/PWRPNT/NEWTIPS.PPT failed for
handle 10080 (Transport endpoint is not connected)
(972) ERROR: handle 10080 was not found
[1041] open ./clients/client49/~dmtmp/WORD/~$CHAP10.DOC failed for
handle 10090 (Transport endpoint is not connected)
(1042) ERROR: handle 10090 was not found
[1001] read failed on handle 10087 (No such file or directory)
[1039] read failed on handle 10089 (No such file or directory)
[972] write failed on handle 10080 (File descriptor in bad state)
[922] read failed on handle 10066 (No such file or directory)
[979] write failed on handle 10081 (Transport endpoint is not connected)
[1003] read failed on handle 10087 (No such file or directory)
[938] read failed on handle 10070 (Transport endpoint is not connected)
[1004] read failed on handle 10087 (No such file or directory)
[1002] read failed on handle 10087 (No such file or directory)
[1040] read failed on handle 10089 (No such file or directory)
Child failed with status 1
[1005] write failed on handle 10087 (Transport endpoint is not connected)
[1003] read failed on handle 10087 (No such file or directory)
[root at testbox1 dbench]#
====================
Test run with just iocache, single-server single-client (no AFR):
====================
<snip>
50 6960 16.74 MB/sec execute 32 sec latency 128.816 ms
50 7015 16.98 MB/sec execute 33 sec latency 143.153 ms
50 7063 16.96 MB/sec execute 34 sec latency 193.604 ms
50 7063 16.48 MB/sec execute 35 sec latency 1060.934 ms
50 7063 16.03 MB/sec execute 36 sec latency 2061.731 ms
50 7063 15.60 MB/sec execute 37 sec latency 3062.524 ms
50 7063 15.20 MB/sec execute 38 sec latency 4063.325 ms
<snip 40+ lines>
50 7063 6.91 MB/sec execute 85 sec latency 50137.294 ms
50 7063 6.83 MB/sec execute 86 sec latency 51139.100 ms
[6791] write failed on handle 11244 (Transport endpoint is not connected)
Child failed with status 1
[root at testbox1 dbench]#
============
Client log for just iocache run above (many lines following this chunk omitted):
============
2008-10-14 15:26:23 W [fuse-bridge.c:398:fuse_entry_cbk]
glusterfs-fuse: 2: (34) / => 1 Rehashing 0/0
2008-10-14 15:42:57 W [fuse-bridge.c:398:fuse_entry_cbk]
glusterfs-fuse: 2: (34) / => 1 Rehashing 0/0
2008-10-14 15:46:31 W [client-protocol.c:4784:client_protocol_cleanup]
remote1: cleaning up state in transport object 0x9e8e858
2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
remote1: forced unwinding frame type(1) op(35) reply=@0x9a6fede8
2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
remote1: forced unwinding frame type(1) op(39) reply=@0x9a6fede8
2008-10-14 15:46:31 E [client-protocol.c:3446:client_readdir_cbk]
remote1: no proper reply from server, returning ENOTCONN
2008-10-14 15:46:31 E [fuse-bridge.c:1940:fuse_readdir_cbk]
glusterfs-fuse: 895468: READDIR => -1 (107)
2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
remote1: forced unwinding frame type(1) op(39) reply=@0x9a6fede8
2008-10-14 15:46:31 E [client-protocol.c:3446:client_readdir_cbk]
remote1: no proper reply from server, returning ENOTCONN
2008-10-14 15:46:31 E [fuse-bridge.c:1940:fuse_readdir_cbk]
glusterfs-fuse: 895469: READDIR => -1 (107)
2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
remote1: forced unwinding frame type(1) op(39) reply=@0x9a6fede8
2008-10-14 15:46:31 E [client-protocol.c:3446:client_readdir_cbk]
remote1: no proper reply from server, returning ENOTCONN
===================================
Client config (enable any of readahead, writeback, or iocache for crash):
===================================
volume remote1
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host testbox3 # IP address of the remote brick
# option remote-port 6996 # default server port is 6996
# option transport-timeout 30 # seconds to wait for a reply
# from server for each request
option remote-subvolume io-thr # name of the remote volume
end-volume
### Add io-threads feature
volume iot
type performance/io-threads
option thread-count 2 # deault is 1
subvolumes remote1
end-volume
### Add readahead feature
#volume readahead
# type performance/read-ahead
# option page-size 512KB # 256KB is the default option
# option page-count 16 # 2 is default option
# subvolumes iot
#end-volume
#### Add IO-Cache feature
#volume iocache
# type performance/io-cache
# option cache-size 1024MB # default is 32MB
# option page-size 2MB # default is 128KB
# option force-revalidate-timeout 5
# subvolumes readahead
#end-volume
#### Add writeback feature
#volume writeback
# type performance/write-behind
# option flush-behind on # default value is 'off'
# option aggregate-size 1MB # default value is 0
# subvolumes iocache
#end-volume
===========
Server config:
===========
volume brick
type storage/posix # POSIX FS translator
option directory /storage # Export this directory
end-volume
volume posix-locks
type features/posix-locks
option mandatory on
subvolumes brick
end-volume
volume io-thr
type performance/io-threads
option thread-count 4 # deault is 1
option cache-size 64MB # default is 64MB. This is per thread.
subvolumes posix-locks
end-volume
### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp/server # For TCP/IP transport
# option bind-address 192.168.1.10 # Default is to listen on all interfaces
# option listen-port 6996 # Default is 6996
# option client-volume-filename /etc/glusterfs/glusterfs-client.vol
subvolumes io-thr
option auth.ip.io-thr.allow * # Allow access to "brick" volume
end-volume
More information about the Gluster-devel
mailing list