[Gluster-users] HA translator failure

Krishna Srinivas krishna at zresearch.com
Tue Jan 27 19:39:20 UTC 2009


Daniel,
There were fixes that went into HA recently. Can you check if the bug
is still there?
Krishna

On Wed, Jan 14, 2009 at 11:22 PM, Daniel Maher <dma+gluster at witbe.net> wrote:
> Hi all,
>
> In testing the HA translator under 2.0.0rc1, i've managed to create a
> simple and reproducible scenario in which Gluster fails to maintain
> communication between the client and the server(s).
>
> Server01 and Server02 are AFR'ing each other, with Client01 connected
> via the HA translator.  As a simple test, i launch a script that echoes
> an increasing counter to a text file in the Gluster mount on Client01.
> Client01 is communicating with Server01 in this instance.
>
> I cleanly stop glusterfsd on Server01, and after a momentary hiccup
> (noted in the log excerpt below), things continue to function as
> expected - Client01 commences communication with Server02.  So far so good.
>
> 2009-01-15 15:54:19 E [socket.c:708:socket_connect_finish] export01:
> connection failed (Connection refused)
>
> I re-start glusterfsd on Server01, then, i cleanly stop glusterfsd on
> Server02 (which, of course, Client01 is now communicating with).
> Client01 freaks out (see log excerpt below), does /not/ attempt to
> contact Server01 again, and leaves me with the dreaded "transport
> endpoint not connected" situation.
>
> 2009-01-15 16:06:02 E [ha-helpers.c:266:_ha_next_active_child_for_ctx]
> export-ha: none of the children are connected other than export02
> 2009-01-15 16:06:02 E [ha.c:2715:ha_fstat_cbk] export-ha: no active
> subvolume
> 2009-01-15 16:06:02 E [fuse-bridge.c:533:fuse_attr_cbk] glusterfs-fuse:
> 2932: FSTAT() /counter.txt => -1 (Transport endpoint is not connected)
>
> Client01 sometimes recovers from this, and sometimes it does not.  When
> it does not recover from this situation, the only solution is manual
> intervention (unmount / remount).  That's not the worst of it, though :
> when it /does/ recover, re-starting glusterfsd on Server02 (!) causes
> even more of the errors (see below), and /always/ results in a total
> failure on Client01 within a second or two (transport endpoint not
> connected).  Client01 never recovers from this.
>
> 2009-01-15 19:04:56 E [ha-helpers.c:266:_ha_next_active_child_for_ctx]
> export-ha: none of the children are connected other than export01
> 2009-01-15 19:04:56 E [ha.c:2515:ha_flush_cbk] export-ha: no active
> subvolume
> 2009-01-15 19:04:56 E [fuse-bridge.c:911:fuse_err_cbk] glusterfs-fuse:
> 3058: FLUSH() ERR => -1 (Transport endpoint is not connected)
>
>
> I strongly suspect this is not the expected behaviour of the High
> Availability translator. :)
>
>
> Servers are running FC9 i386, Client is FC10 i386.
>
> # glusterfs --version
> glusterfs 2.0.0rc1 built on Jan 14 2009 13:19:06
> Repository revision: glusterfs--mainline--3.0--patch-844
>
> # rpm -qa | grep fuse
> fuse-2.7.3glfs10-1.i386
> fuse-devel-2.7.3glfs10-1.i386
> fuse-libs-2.7.3glfs10-1.i386
>
>
> Server config :
>
> # cat /etc/glusterfs/glusterfs-server.vol
> # dataspace
> volume test-ds
>   type storage/posix
>   option directory /opt/datadir
> end-volume
>
> # posix locks for test-ds
> volume test-ds-locks
>   type features/locks
>   option mandatory-locks on
>   subvolumes test-ds
> end-volume
>
> # dataspace of test-ds on Server01
>   volume test-01-ds
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.183
>   option remote-subvolume test-ds-locks
>   option transport-timeout 10
> end-volume
>
> # automatic file replication translator for test dataspace
> volume test-ds-afr
>   type cluster/afr
>   subvolumes test-ds-locks test-01-ds
> end-volume
>
> # the actual export
> volume export
>   type performance/io-threads
>   option thread-count 8
>   subvolumes test-ds-afr
> end-volume
>
> # server declaration
> volume server
>   type protocol/server
>   option transport-type tcp/server
>   subvolumes export
>   option auth.addr.export.allow
> 192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1
>   option auth.addr.test-ds-locks.allow
> 192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1
> end-volume
>
>
>
> client config :
> # cat /etc/glusterfs/glusterfs-client.vol
>
> # export on Server01
> volume export01
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.183
>   option remote-subvolume export      # exported volume
> end-volume
>
> # export on Server02
> volume export02
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 192.168.0.166
>   option remote-subvolume export      # exported volume
> end-volume
>
> # exports clustered via HA
> volume export-ha
>   type cluster/ha
>   subvolumes export01 export02
> end-volume
>
>
>
> --
> Daniel Maher <dma+gluster AT witbe DOT net>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>




More information about the Gluster-users mailing list