[Gluster-users] HA translator failure
Daniel Maher
dma+gluster at witbe.net
Wed Jan 14 17:52:38 UTC 2009
Hi all,
In testing the HA translator under 2.0.0rc1, i've managed to create a
simple and reproducible scenario in which Gluster fails to maintain
communication between the client and the server(s).
Server01 and Server02 are AFR'ing each other, with Client01 connected
via the HA translator. As a simple test, i launch a script that echoes
an increasing counter to a text file in the Gluster mount on Client01.
Client01 is communicating with Server01 in this instance.
I cleanly stop glusterfsd on Server01, and after a momentary hiccup
(noted in the log excerpt below), things continue to function as
expected - Client01 commences communication with Server02. So far so good.
2009-01-15 15:54:19 E [socket.c:708:socket_connect_finish] export01:
connection failed (Connection refused)
I re-start glusterfsd on Server01, then, i cleanly stop glusterfsd on
Server02 (which, of course, Client01 is now communicating with).
Client01 freaks out (see log excerpt below), does /not/ attempt to
contact Server01 again, and leaves me with the dreaded "transport
endpoint not connected" situation.
2009-01-15 16:06:02 E [ha-helpers.c:266:_ha_next_active_child_for_ctx]
export-ha: none of the children are connected other than export02
2009-01-15 16:06:02 E [ha.c:2715:ha_fstat_cbk] export-ha: no active
subvolume
2009-01-15 16:06:02 E [fuse-bridge.c:533:fuse_attr_cbk] glusterfs-fuse:
2932: FSTAT() /counter.txt => -1 (Transport endpoint is not connected)
Client01 sometimes recovers from this, and sometimes it does not. When
it does not recover from this situation, the only solution is manual
intervention (unmount / remount). That's not the worst of it, though :
when it /does/ recover, re-starting glusterfsd on Server02 (!) causes
even more of the errors (see below), and /always/ results in a total
failure on Client01 within a second or two (transport endpoint not
connected). Client01 never recovers from this.
2009-01-15 19:04:56 E [ha-helpers.c:266:_ha_next_active_child_for_ctx]
export-ha: none of the children are connected other than export01
2009-01-15 19:04:56 E [ha.c:2515:ha_flush_cbk] export-ha: no active
subvolume
2009-01-15 19:04:56 E [fuse-bridge.c:911:fuse_err_cbk] glusterfs-fuse:
3058: FLUSH() ERR => -1 (Transport endpoint is not connected)
I strongly suspect this is not the expected behaviour of the High
Availability translator. :)
Servers are running FC9 i386, Client is FC10 i386.
# glusterfs --version
glusterfs 2.0.0rc1 built on Jan 14 2009 13:19:06
Repository revision: glusterfs--mainline--3.0--patch-844
# rpm -qa | grep fuse
fuse-2.7.3glfs10-1.i386
fuse-devel-2.7.3glfs10-1.i386
fuse-libs-2.7.3glfs10-1.i386
Server config :
# cat /etc/glusterfs/glusterfs-server.vol
# dataspace
volume test-ds
type storage/posix
option directory /opt/datadir
end-volume
# posix locks for test-ds
volume test-ds-locks
type features/locks
option mandatory-locks on
subvolumes test-ds
end-volume
# dataspace of test-ds on Server01
volume test-01-ds
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.183
option remote-subvolume test-ds-locks
option transport-timeout 10
end-volume
# automatic file replication translator for test dataspace
volume test-ds-afr
type cluster/afr
subvolumes test-ds-locks test-01-ds
end-volume
# the actual export
volume export
type performance/io-threads
option thread-count 8
subvolumes test-ds-afr
end-volume
# server declaration
volume server
type protocol/server
option transport-type tcp/server
subvolumes export
option auth.addr.export.allow
192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1
option auth.addr.test-ds-locks.allow
192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1
end-volume
client config :
# cat /etc/glusterfs/glusterfs-client.vol
# export on Server01
volume export01
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.183
option remote-subvolume export # exported volume
end-volume
# export on Server02
volume export02
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.166
option remote-subvolume export # exported volume
end-volume
# exports clustered via HA
volume export-ha
type cluster/ha
subvolumes export01 export02
end-volume
--
Daniel Maher <dma+gluster AT witbe DOT net>
More information about the Gluster-users
mailing list