[Gluster-devel] pre6 hanging problems
August R. Wohlt
glusterfs at isidore.net
Wed Jul 25 20:12:27 UTC 2007
Hi all -
I have client and server set up with the pre6 version of gluserfs. Several
times a day the client mount will freeze up as does any command that tries
to read from the mountpoint. I have to kill the glusterfs process, unmount
the directory and remount it to get it to work again.
When this happens, there is another glusterfs client on other machines
connected to the same server that does not get disconnected. So the timeout
message in the logs is confusing to me. If it's really timing out wouldn't
the other server be disconnected, too?
This is on CentOS 5 with fuse 2.7.0-glfs.
When it happens, here's what shows up in the client:
...
2007-07-25 09:45:59 D [inode.c:327:__active_inode] fuse/inode: activating
inode(4210807), lru=0/1024
2007-07-25 09:45:59 D [inode.c:285:__destroy_inode] fuse/inode: destroy
inode(4210807)
2007-07-25 12:37:26 W [client-protocol.c:211:call_bail] brick: activating
bail-out. pending frames = 1. last sent =
2007-07-25 12:33:42. last received = 2007-07-25 11:42:59 transport-timeout =
120
2007-07-25 12:37:26 C [client-protocol.c:219:call_bail] brick: bailing
transport
2007-07-25 12:37:26 W [client-protocol.c:4189:client_protocol_cleanup]
brick: cleaning up state in transport object
0x80a03d0
2007-07-25 12:37:26 W [client-protocol.c:4238:client_protocol_cleanup]
brick: forced unwinding frame type(0) op(15)
2007-07-25 12:37:26 C [tcp.c:81:tcp_disconnect] brick: connection
disconnected
When it happens, here's what shows up in the server:
2007-07-25 15:37:40 E [protocol.c:346:gf_block_unserialize_transport]
libglusterfs/protocol: full_read of block failed: peer (
192.168.2.3:1023)
2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
disconnected
2007-07-25 15:37:40 E [protocol.c:251:gf_block_unserialize_transport]
libglusterfs/protocol: EOF from peer (192.168.2.4:1023)
2007-07-25 15:37:40 C [tcp.c:81:tcp_disconnect] server: connection
disconnected
And here's the client backtrace:
(gdb) bt
#0 0x0032e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x005a3824 in raise () from /lib/tls/libpthread.so.0
#2 0x00655b0c in tcp_bail (this=0x80a03d0) at
../../../../transport/tcp/tcp.c:146
#3 0x00695bbc in transport_bail (this=0x80a03d0) at transport.c:192
#4 0x00603a16 in call_bail (trans=0x80a03d0) at client-protocol.c:220
#5 0x00696870 in gf_timer_proc (ctx=0xbffeec30) at timer.c:119
#6 0x0059d3cc in start_thread () from /lib/tls/libpthread.so.0
#7 0x00414c3e in clone () from /lib/tls/libc.so.6
client config:
### Add client feature and attach to remote subvolume
volume brick
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 192.168.2.5 # IP address of the remote brick
option remote-subvolume brick_1 # name of the remote volume
end-volume
# #### Add writeback feature
volume brick-wb
type performance/write-behind
option aggregate-size 131072 # unit in bytes
subvolumes brick
end-volume
server config:
### Export volume "brick" with the contents of "/home/export" directory.
volume brick_1
type storage/posix
option directory /home/vg_3ware1/vivalog/brick_1
end-volume
volume brick_2
type storage/posix
option directory /home/vg_3ware1/vivalog/brick_2
end-volume
### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp/server # For TCP/IP transport
option bind-address 192.168.2.5 # Default is to listen on all
interfaces
subvolumes brick_1
option auth.ip.brick_2.allow * # Allow access to "brick" volume
option auth.ip.brick_1.allow * # Allow access to "brick" volume
end-volume
ps I have one server serving two volume bricks to two physically distinct
clients. I assume this is okay--that I don't need to have two separate
server declarations.
More information about the Gluster-devel
mailing list