[Gluster-devel] High abailability question...

Tue May 27 05:20:56 UTC 2008

Victor,

Here is your steps:
1 open file
2 read
3 bring first child down
4 read continues with a seamless failover
5 bring first child up, and bring second child down.
6 read fails.

Now retrying again from first server would be difficult as all the "states"
associated with that server would have got lost when it "went down"
It could have been a temporary network disconnect or the server could
have got rebooted (in which case it would have lost the open file
descriptor)

The file has to be reopened in this case.

Regards
Krishna

On Mon, May 26, 2008 at 5:20 PM, Victor San Pedro <vsanpedro at bioalma.com>
wrote:

> Hello. My name is Víctor and I would like to ask about some test I have
> been doing with glusterfs.
> We are a bio-search company, and we are thinking on using gluserfs to
> develop one of our projects.
>
> I am doing the test with two servers and a client, using an AFR cluster
> mode parsed on the client side.
> The fact is I have been reading on the glusterfs documentation that this
> sort of implementation would included high availability in case one
> server goes down.
>
> My files are:
>
> *Fichero de configuración del cliente (CENTRAL). CLIENT*
>
> *volume sargasv0*
>
> type protocol/client
>
> option transport-type tcp/client
>
> option remote-host 192.168.1.60
>
> option remote-port 6996
>
> option remote-subvolume v0
>
> *end-volume*
>
> *volume shedirv4*
>
> type protocol/client
>
> option transport-type tcp/client
>
> option remote-host 192.168.1.61
>
> option remote-port 6996
>
> option remote-subvolume v4
>
> *end-volume*
>
> *volume mirror0*
>
> type cluster/afr
>
> subvolumes sargasv0 shedirv4
>
> *end-volume*
>
>
>
> *Fichero de configuración del servidor (SARGAS) SERVER1*
>
> *volume v0*
>
> type storage/posix
>
> option directory /tmp/export0
>
> *end-volume*
>
> *volume server*
>
> type protocol/server
>
> option transport-type tcp/server
>
> option listen-port 6996
>
> option auth.ip.v0.allow *
>
> subvolumes v0
>
> *end-volume*
>
>
>
> *Fichero de configuración del servidor (SHEDIR* ) *SERVER2*
>
> *volume v4*
>
> type storage/posix
>
> option directory /tmp/export4
>
> *end-volume*
>
> *volume server*
>
> type protocol/server
>
> option transport-type tcp/server
>
> option listen-port 6996
>
> option auth.ip.v4.allow *
>
> subvolumes v4
>
> *end-volume*
>
>
>
>
> Well, I have done some tests playing a movie (*.avi) file installed over
> my glusterfs mounted directory with Totem Movie Player. Farther tests
> were done on VLC media player with identical results.
> I am running ubuntu 7.10.
>
> Once I have run the glusterfs infrastructure with the two servers and
> the client, I made a copy of the avi file from my home directory to the
> mounted glusterfs on the client. The file was copied correctly and the
> replication to servers was ok.
> I began the test on debug mode. When I plug off one of the servers I
> could keep on watching the video after a period of load balance to the
> remaining active server of about 20/30 seconds.
> Well this is high abailability, but when I plug again the server that
> previously I had desattached and plug off the other one, I obtained the
> following error: "could not read from resource", and the following lines
> on the debug's log.
>
> *2008-05-23 12:45:31 D
> [client-protocol.c:4750:client_protocol_reconnect] sargasv0: attempting
> reconnect *
>
> *2008-05-23 12:45:31 D [tcp-client.c:77:tcp_connect] sargasv0: socket fd
> = 6 *
>
> *2008-05-23 12:45:31 D [tcp-client.c:107:tcp_connect] sargasv0:
> finalized on port `1023' *
>
> *2008-05-23 12:45:31 D [common-utils.c:179:gf_resolve_ip] resolver: DNS
> cache not present, freshly probing hostname: 192.168.1.60 *
>
> *2008-05-23 12:45:31 D [common-utils.c:204:gf_resolve_ip] resolver:
> returning IP:192.168.1.60[0] for hostname: 192.168.1.60 *
>
> *2008-05-23 12:45:31 D [common-utils.c:212:gf_resolve_ip] resolver:
> flushing DNS cache *
>
> *2008-05-23 12:45:31 D [tcp-client.c:161:tcp_connect] sargasv0: connect
> on 6 in progress (non-blocking) *
>
> *2008-05-23 12:45:31 D [tcp-client.c:198:tcp_connect] sargasv0:
> connection on 6 still in progress - try later *
>
> *2008-05-23 12:45:35 W [client-protocol.c:205:call_bail] shedirv4:
> activating bail-out. pending frames = 1. last sent = 2008-05-23
> 12:44:52. last received = 2008-05-23 12:44:52 transport-timeout = 42 *
>
> *2008-05-23 12:45:35 C [client-protocol.c:212:call_bail] shedirv4:
> bailing transport *
>
> *2008-05-23 12:45:35 D [tcp.c:137:cont_hand] tcp: forcing
> poll/read/write to break on blocked socket (if any) *
>
> *2008-05-23 12:45:35 W [client-protocol.c:4777:client_protocol_cleanup]
> shedirv4: cleaning up state in transport object 0x808bd90 *
>
> *2008-05-23 12:45:35 E [client-protocol.c:4827:client_protocol_cleanup]
> shedirv4: forced unwinding frame type(1) op(13) reply=@0xb6a00468 *
>
> *2008-05-23 12:45:35 E [client-protocol.c:3193:client_readv_cbk]
> shedirv4: no proper reply from server, returning ENOTCONN *
>
> *2008-05-23 12:45:35 D [afr.c:2248:afr_readv_cbk] mirror0: reading from
> child 2 *
>
> *2008-05-23 12:45:35 E [afr.c:2262:afr_readv_cbk] mirror0:
> (path=/dc4.avi child=shedirv4) op_ret=-1 op_errno=107 *
>
> *2008-05-23 12:45:35 E [fuse-bridge.c:1551:fuse_readv_cbk]
> glusterfs-fuse: 182438: READ => -1 (107) *
>
> *2008-05-23 12:45:35 D [tcp.c:87:tcp_disconnect] shedirv4: connection
> disconnected *
>
> *2008-05-23 12:45:35 D [afr.c:5939:notify] mirror0: GF_EVENT_CHILD_DOWN
> from shedirv4 *
>
> *2008-05-23 12:45:35 D [fuse-bridge.c:1577:fuse_readv] glusterfs-fuse:
> 182439: READ (0xb6c01420, size=4096, offset=172892160) *
>
> *2008-05-23 12:45:35 E [fuse-bridge.c:1551:fuse_readv_cbk]
> glusterfs-fuse: 182439: READ => -1 (107) *
>
> *2008-05-23 12:45:35 D [fuse-bridge.c:1577:fuse_readv] glusterfs-fuse:
> 182440: READ (0xb6c01420, size=4096, offset=172892160) *
>
> *2008-05-23 12:45:35 E [fuse-bridge.c:1551:fuse_readv_cbk]
> glusterfs-fuse: 182440: READ => -1 (107) *
>
> *...
> *
>
> In this case, I had to close the file and play it again. Then glusterfs
> looked for the file on the active server and run it without problems.
> But, If you do the test again, pluging the server that was previously
> unplugged and plugging off the one that was active the same error comes
> out and the film is stopped again.
>
> Therefor, the very first time one server is down, is possible to
> maintain the file open and continue watching the video, but second and
> following attemps would became on read error and it is necessary to
> re-open the file again...
>
> Is there a way of avoiding this read-error in order to maintain my file
> opened and continue watching the movie after the load balance to the
> active server has happened from a second time?
>
> Thank you for your help.
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>