[Gluster-devel] dbench crashes with client-side perf translators
Anand Avati
avati at zresearch.com
Wed Oct 15 03:51:19 UTC 2008
Jake,
can you send us the complete client and server log of glusterfs?
avati
2008/10/15, Jake Maul <jakemaul at gmail.com>:
>
> Greetings,
>
> I'm working on implementing GlusterFS as a replacement for NFS, and
> have ran into a strange hiccup with client-side performance
> translators when benchmarking with dbench
> (http://samba.org/ftp/tridge/dbench/). Below is the output. Note that
> it runs normally for ~80 seconds, then seems to stop responding
> (latency goes up by ~1s every second). Eventually around 50 seconds
> after that, it gives up.
>
> This happens with a 2-server AFR setup or a single backend. With a
> simple setup (client/io-threads on the client,
> posix/posix-locks/io-threads/server on the server), it works. If I
> enable any of iocache, readahead, or writeback on the client side,
> dbench will crash if the concurrency is very high ('50' breaks it for
> me... possibly lower, haven't tested exhaustively).
>
> IOzone is perfectly happy with all 3 running. I've not tested with any
> other FUSE filesystem. Honestly I can't say where the problem might
> be... dbench, FUSE, or GlusterFS. I've not tested any version but
> 1.3.12a of GlusterFS, running the stock CentOS 5.2 fuse and the
> GlusterFS patched fuse. Any ideas?
>
> Lots of troubleshooting info below.... Note it's supposed to run for
> 120sec 'warmup' and 600sec 'execute' ... 12 minutes total. Problems
> generally occur well before the half-way mark.
>
> Thanks,
> Jake
>
>
>
> CentOS release 5.2 (Final)
> Linux testbox2.localdomain 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24
> 19:33:52 EDT 2008 i686 i686 i386 GNU/Linux
> glusterfs 1.3.12a built on Oct 9 2008 17:04:22
> Repository revision: glusterfs--mainline--2.5--patch-799
> glusterfs on /testbox3 type fuse
> (rw,nosuid,nodev,allow_other,default_permissions,max_read=1048576)
>
> =================
> Test run on an AFR setup with all 3 enabled:
> =================
> [root at testbox1 dbench]# /usr/src/dbench-4.0/dbench -c
> /usr/src/dbench-4.0/client.txt 50
> <snip>
> 50 940 14.05 MB/sec warmup 78 sec latency 373.806 ms
> 50 957 13.93 MB/sec warmup 79 sec latency 365.397 ms
> 50 973 13.82 MB/sec warmup 80 sec latency 590.412 ms
> 50 985 13.71 MB/sec warmup 81 sec latency 536.578 ms
> 50 989 13.57 MB/sec warmup 82 sec latency 1025.981 ms
> 50 994 13.45 MB/sec warmup 83 sec latency 1406.175 ms
> 50 994 13.29 MB/sec warmup 84 sec latency 1896.278 ms
> 50 994 13.14 MB/sec warmup 85 sec latency 2897.077 ms
> 50 994 12.98 MB/sec warmup 86 sec latency 3899.889 ms
> 50 994 12.83 MB/sec warmup 87 sec latency 4902.711 ms
> <snip>
> 50 994 0.00 MB/sec execute 8 sec latency 46970.466 ms
> 50 994 0.00 MB/sec execute 9 sec latency 47972.272 ms
> 50 994 0.00 MB/sec execute 10 sec latency 48974.077 ms
> 50 994 0.00 MB/sec execute 11 sec latency 49975.883 ms
> [1001] read failed on handle 10087 (No such file or directory)
> [978] read failed on handle 10081 (No such file or directory)
> [1002] read failed on handle 10087 (No such file or directory)
> [971] open ./clients/client27/~dmtmp/PWRPNT/NEWTIPS.PPT failed for
> handle 10080 (Transport endpoint is not connected)
> (972) ERROR: handle 10080 was not found
> [971] open ./clients/client28/~dmtmp/PWRPNT/NEWTIPS.PPT failed for
> handle 10080 (Transport endpoint is not connected)
> (972) ERROR: handle 10080 was not found
> [1008] write failed on handle 10087 (Transport endpoint is not connected)
> [971] open ./clients/client19/~dmtmp/PWRPNT/NEWTIPS.PPT failed for
> handle 10080 (Transport endpoint is not connected)
> (972) ERROR: handle 10080 was not found
> [1041] open ./clients/client49/~dmtmp/WORD/~$CHAP10.DOC failed for
> handle 10090 (Transport endpoint is not connected)
> (1042) ERROR: handle 10090 was not found
> [1001] read failed on handle 10087 (No such file or directory)
> [1039] read failed on handle 10089 (No such file or directory)
> [972] write failed on handle 10080 (File descriptor in bad state)
> [922] read failed on handle 10066 (No such file or directory)
> [979] write failed on handle 10081 (Transport endpoint is not connected)
> [1003] read failed on handle 10087 (No such file or directory)
> [938] read failed on handle 10070 (Transport endpoint is not connected)
> [1004] read failed on handle 10087 (No such file or directory)
> [1002] read failed on handle 10087 (No such file or directory)
> [1040] read failed on handle 10089 (No such file or directory)
> Child failed with status 1
> [1005] write failed on handle 10087 (Transport endpoint is not connected)
> [1003] read failed on handle 10087 (No such file or directory)
> [root at testbox1 dbench]#
>
> ====================
> Test run with just iocache, single-server single-client (no AFR):
> ====================
> <snip>
> 50 6960 16.74 MB/sec execute 32 sec latency 128.816 ms
> 50 7015 16.98 MB/sec execute 33 sec latency 143.153 ms
> 50 7063 16.96 MB/sec execute 34 sec latency 193.604 ms
> 50 7063 16.48 MB/sec execute 35 sec latency 1060.934 ms
> 50 7063 16.03 MB/sec execute 36 sec latency 2061.731 ms
> 50 7063 15.60 MB/sec execute 37 sec latency 3062.524 ms
> 50 7063 15.20 MB/sec execute 38 sec latency 4063.325 ms
> <snip 40+ lines>
> 50 7063 6.91 MB/sec execute 85 sec latency 50137.294 ms
> 50 7063 6.83 MB/sec execute 86 sec latency 51139.100 ms
> [6791] write failed on handle 11244 (Transport endpoint is not connected)
> Child failed with status 1
> [root at testbox1 dbench]#
>
>
> ============
> Client log for just iocache run above (many lines following this chunk
> omitted):
> ============
> 2008-10-14 15:26:23 W [fuse-bridge.c:398:fuse_entry_cbk]
> glusterfs-fuse: 2: (34) / => 1 Rehashing 0/0
> 2008-10-14 15:42:57 W [fuse-bridge.c:398:fuse_entry_cbk]
> glusterfs-fuse: 2: (34) / => 1 Rehashing 0/0
> 2008-10-14 15:46:31 W [client-protocol.c:4784:client_protocol_cleanup]
> remote1: cleaning up state in transport object 0x9e8e858
> 2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
> remote1: forced unwinding frame type(1) op(35) reply=@0x9a6fede8
> 2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
> remote1: forced unwinding frame type(1) op(39) reply=@0x9a6fede8
> 2008-10-14 15:46:31 E [client-protocol.c:3446:client_readdir_cbk]
> remote1: no proper reply from server, returning ENOTCONN
> 2008-10-14 15:46:31 E [fuse-bridge.c:1940:fuse_readdir_cbk]
> glusterfs-fuse: 895468: READDIR => -1 (107)
> 2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
> remote1: forced unwinding frame type(1) op(39) reply=@0x9a6fede8
> 2008-10-14 15:46:31 E [client-protocol.c:3446:client_readdir_cbk]
> remote1: no proper reply from server, returning ENOTCONN
> 2008-10-14 15:46:31 E [fuse-bridge.c:1940:fuse_readdir_cbk]
> glusterfs-fuse: 895469: READDIR => -1 (107)
> 2008-10-14 15:46:31 E [client-protocol.c:4834:client_protocol_cleanup]
> remote1: forced unwinding frame type(1) op(39) reply=@0x9a6fede8
> 2008-10-14 15:46:31 E [client-protocol.c:3446:client_readdir_cbk]
> remote1: no proper reply from server, returning ENOTCONN
>
> ===================================
> Client config (enable any of readahead, writeback, or iocache for crash):
> ===================================
> volume remote1
> type protocol/client
> option transport-type tcp/client # for TCP/IP transport
> option remote-host testbox3 # IP address of the remote brick
> # option remote-port 6996 # default server port is 6996
> # option transport-timeout 30 # seconds to wait for a reply
> # from server for each request
> option remote-subvolume io-thr # name of the remote volume
> end-volume
>
> ### Add io-threads feature
> volume iot
> type performance/io-threads
> option thread-count 2 # deault is 1
> subvolumes remote1
> end-volume
>
> ### Add readahead feature
> #volume readahead
> # type performance/read-ahead
> # option page-size 512KB # 256KB is the default option
> # option page-count 16 # 2 is default option
> # subvolumes iot
> #end-volume
>
> #### Add IO-Cache feature
> #volume iocache
> # type performance/io-cache
> # option cache-size 1024MB # default is 32MB
> # option page-size 2MB # default is 128KB
> # option force-revalidate-timeout 5
> # subvolumes readahead
> #end-volume
>
> #### Add writeback feature
> #volume writeback
> # type performance/write-behind
> # option flush-behind on # default value is 'off'
> # option aggregate-size 1MB # default value is 0
> # subvolumes iocache
> #end-volume
>
> ===========
> Server config:
> ===========
> volume brick
> type storage/posix # POSIX FS translator
> option directory /storage # Export this directory
> end-volume
>
> volume posix-locks
> type features/posix-locks
> option mandatory on
> subvolumes brick
> end-volume
>
> volume io-thr
> type performance/io-threads
> option thread-count 4 # deault is 1
> option cache-size 64MB # default is 64MB. This is per thread.
> subvolumes posix-locks
> end-volume
>
> ### Add network serving capability to above brick.
> volume server
> type protocol/server
> option transport-type tcp/server # For TCP/IP transport
> # option bind-address 192.168.1.10 # Default is to listen on all
> interfaces
> # option listen-port 6996 # Default is 6996
> # option client-volume-filename /etc/glusterfs/glusterfs-client.vol
> subvolumes io-thr
> option auth.ip.io-thr.allow * # Allow access to "brick" volume
> end-volume
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
--
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.
More information about the Gluster-devel
mailing list