[Gluster-devel] glusterfs client problem

Pooya Woodcock pooya at packetcloud.net
Tue Apr 3 01:26:02 UTC 2007


Does this get better if you use the --bwlimit flag on rsync?


On Apr 2, 2007, at 3:46 PM, Shawn Northart wrote:

> I'm noticing a problem with our test setup with regard to (reasonably)
> heavy read/write usage.
> the probelem we're having is that during an rsync of content, the sync
> bails due to the mount being lost with the following errors:
>
> <snip>
> rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trailers" failed:
> Transport endpoint is not connected (107)
> rsync: recv_generator: mkdir
> "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: Transport
> endpoint is not connected (107)
> rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed:
> Transport endpoint is not connected (107)
> rsync: recv_generator: mkdir
> "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" failed:
> Transport endpoint is not connected (107)
> rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux"
> failed: Transport endpoint is not connected (107)
> rsync: recv_generator: mkdir
> "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" failed:
> Transport endpoint is not connected (107)
> rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images"
> failed: Transport endpoint is not connected (107)
> rsync: recv_generator: mkdir
> "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" failed:  
> Transport
> endpoint is not connected (107)
> rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers"
> failed: Transport endpoint is not connected (107)
> </snip>
>
> normal logging shows nothing either client or server-side, but running
> logging in DEBUG mode shows the following at the end of the client log
> right as it breaks:
>
> <snip>
> [Apr 02 13:25:11] [DEBUG/common-utils.c:213/gf_print_trace()]
> debug-backtrace:Got signal (11), printing backtrace
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0 
> (gf_print_trace+0x1f) [0x2a9556030f]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/lib64/tls/libc.so.6 [0x35b992e2b0]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/lib64/tls/libpthread.so.0(__pthread_mutex_destroy+0)
> [0x35ba807ab0]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0- 
> pre2.2/xlator/cluster/afr.so [0x2a958b840c]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0- 
> pre2.2/xlator/protocol/client.so [0x2a957b06c2]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0- 
> pre2.2/xlator/protocol/client.so [0x2a957b3196]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0 
> (epoll_iteration+0xf8) [0x2a955616f8]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:[glusterfs] [0x4031b7]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:/lib64/tls/libc.so.6(__libc_start_main+0xdb)
> [0x35b991c3fb]
> [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
> debug-backtrace:[glusterfs] [0x402bba]
> </snip>
>
>
> the server log shows the following at the time it breaks:
> <snip>
> [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()]
> libglusterfs:full_rw: 0 bytes r/w instead of 113
> [Apr 02 15:30:09]
> [DEBUG/protocol.c:244/gf_block_unserialize_transport()]
> libglusterfs/protocol:gf_block_unserialize_transport: full_read of
> header failed
> [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()]
> protocol/server:cleaned up xl_private of 0x510470
> [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()]
> tcp/server:destroying transport object for 192.168.0.96:1012 (fd=8)
> [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()]
> libglusterfs:full_rw: 0 bytes r/w instead of 113
> [Apr 02 15:30:09]
> [DEBUG/protocol.c:244/gf_block_unserialize_transport()]
> libglusterfs/protocol:gf_block_unserialize_transport: full_read of
> header failed
> [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()]
> protocol/server:cleaned up xl_private of 0x510160
> [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()]
> tcp/server:destroying transport object for 192.168.0.96:1013 (fd=7)
> [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()]
> libglusterfs:full_rw: 0 bytes r/w instead of 113
> [Apr 02 15:30:09]
> [DEBUG/protocol.c:244/gf_block_unserialize_transport()]
> libglusterfs/protocol:gf_block_unserialize_transport: full_read of
> header failed
> [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()]
> protocol/server:cleaned up xl_private of 0x502300
> [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()]
> tcp/server:destroying transport object for 192.168.0.96:1014 (fd=4)
> </snip>
>
> we're using 4 bricks in this setup and for the moment, just one client
> (would like to scale between 20-30 clients and 4-8 server bricks).
> the same behavior is observed when used with or without any  
> combination
> of any of the performance translators as well as with or without file
> replication.   alu, random, and round-robin schedulers were all  
> used in
> our testing.
> the systems in question are running CentOS (4.4).   these logs are  
> from
> our 64-bit systems but we have seen the exact same thing on the 32-bit
> ones as well.
> this (glusterfs) looks like it could be a good fit for some of the
> high-traffic domains we host, but unless we can resolve this issue,
> we'll have to continue using NFS.
>
>
> our current server-side (brick) config consists of the following:
> ##-- begin server config
> volume vol1
>   type storage/posix
>   option directory /vol/vol1/gfs
> end-volume
>
> volume vol2
>   type storage/posix
>   option directory /vol/vol2/gfs
> end-volume
>
> volume vol3
>   type storage/posix
>   option directory /vol/vol3/gfs
> end-volume
>
> volume brick1
>   type performance/io-threads
>   option thread-count 8
>   subvolumes vol1
> end-volume
>
> volume brick2
>   type performance/io-threads
>   option thread-count 8
>   subvolumes vol2
> end-volume
>
> volume brick3
>   type performance/io-threads
>   option thread-count 8
>   subvolumes vol3
> end-volume
>
> volume server
>   type protocol/server
>   option transport-type tcp/server
>   option bind-address 10.88.188.91
>   subvolumes brick1 brick2 brick3
>   option auth.ip.brick1.allow 192.168.0.*
>   option auth.ip.brick2.allow 192.168.0.*
>   option auth.ip.brick3.allow 192.168.0.*
> end-volume
> ##-- end server config
>
>
> our client config is as follows:
>
> ##-- begin client config
> volume test00.1
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.91
>   option remote-subvolume brick1
> end-volume
> volume test00.2
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.91
>   option remote-subvolume brick2
> end-volume
> volume test00.3
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.91
>   option remote-subvolume brick3
> end-volume
>
>
> volume test01.1
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.92
>   option remote-subvolume brick1
> end-volume
> volume test01.2
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.92
>   option remote-subvolume brick2
> end-volume
> volume test01.3
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.92
>   option remote-subvolume brick3
> end-volume
>
>
> volume test02.1
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.93
>   option remote-subvolume brick1
> end-volume
> volume test02.2
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.93
>   option remote-subvolume brick2
> end-volume
> volume test02.3
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.93
>   option remote-subvolume brick3
> end-volume
>
>
> volume test03.1
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.94
>   option remote-subvolume brick1
> end-volume
> volume test03.2
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.94
>   option remote-subvolume brick2
> end-volume
> volume test03.3
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.94
>   option remote-subvolume brick3
> end-volume
>
>
>
> volume afr0
>   type cluster/afr
>   subvolumes test00.1 test01.2 test02.3
>   option replicate *.html:3,*.db:1,*:3
> end-volume
>
> volume afr1
>   type cluster/afr
>   subvolumes test01.1 test02.2 test03.3
>   option replicate *.html:3,*.db:1,*:3
> end-volume
>
> volume afr2
>   type cluster/afr
>   subvolumes test02.1 test03.2 test00.3
>   option replicate *.html:3,*.db:1,*:3
> end-volume
>
> volume afr3
>   type cluster/afr
>   subvolumes test03.1 test00.2 test01.3
>   option replicate *.html:3,*.db:1,*:3
> end-volume
>
>
> volume bricks
>   type cluster/unify
>   subvolumes afr0 afr1 afr2 afr3
>   option readdir-force-success on
>
>   option scheduler alu
>   option alu.limits.min-free-disk  60GB
>   option alu.limits.max-open-files 10000
>
>   option alu.order
> disk-usage:read-usage:open-files-usage:write-usage:disk-speed-usage
>
>   option alu.disk-usage.entry-threshold 2GB
>   option alu.disk-usage.exit-threshold  60MB
>   option alu.open-files-usage.entry-threshold 1024
>   option alu.open-files-usage.exit-threshold 32
>   option alu.stat-refresh.interval 10sec
>
>  option alu.read-usage.entry-threshold 20%
>  option alu.read-usage.exit-threshold 4%
>  option alu.write-usage.entry-threshold 20%
>  option alu.write-usage.exit-threshold 4%
>
> end-volume
> ##-- end client config
>
>
> ~Shawn
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel






More information about the Gluster-devel mailing list