[Gluster-devel] Spurious disconnect in 3.4.0alpha
krish
kparthas at redhat.com
Fri Mar 1 04:55:20 UTC 2013
Hi Emmanuel,
On 03/01/2013 07:55 AM, Emmanuel Dreyfus wrote:
> Hi
>
> The spurious disconnect I encountered in 3.4 branch still happen in
> 3.4.0alpha, but glusterfs recovers much better now. However, when
> running huge tar -xzf I still hit operation failures, after which
> everything is restored to normal state.
>
> Here is the client log, in which the issue is hit at 18:06:36
> http://ftp.espci.fr/shadow/manu/client.log
>
> The relevant part is below. I understand glusterfs is able to restore
> its connections and everything works fine, except when it happens on all
> volumes simultaneously.
>
> [2013-02-28 18:06:36.105271] W
> [socket.c:1962:__socket_proto_state_machine] 0-gfs33-client-3: reading
> from socket failed. Error (No message available), peer
> (192.0.2.98:49153)
> [2013-02-28 18:06:36.105340] E [rpc-clnt.c:368:saved_frames_unwind]
> 0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
> op(LOOKUP(27)) called at 2013-02-28 18:06:36.104358 (xid=0x3728220x)
> [2013-02-28 18:06:36.105454] W
> [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gfs33-client-3: remote
> operation failed: Socket is not connected. Path:
> /manu/netbsd/usr/src/external (6fb65713-062a-464d-a9d4-e97dab3c298b)
> [2013-02-28 18:06:36.105514] E [rpc-clnt.c:368:saved_frames_unwind]
> 0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
> op(RELEASE(41)) called at 2013-02-28 18:06:36.104843 (xid=0x3728221x)
> [2013-02-28 18:06:36.105537] I [client.c:2097:client_rpc_notify]
> 0-gfs33-client-3: disconnected
> [2013-02-28 18:06:36.105571] E [afr-common.c:3761:afr_notify]
> 0-gfs33-replicate-1: All subvolumes are down. Going offline until
> atleast one of them comes back up.[2013-02-28 18:06:36.112037] I
> [afr-common.c:3882:afr_local_init] 0-gfs33-replicate-1: no subvolumes up
I see that 0-gfs33-client-2 xlator is unable to connect to glusterd
(that should be) running
on hotstuff:24007. The client xlator attempts to reconnect every 3s
since last attempt.
This is why we see logs about client disconnection repeat.
Could you check if glusterd was running on the host "hotstuff", when the
client
experiences spurious disconnects?
To confirm this when you notice the 'spurious' disconnects, try
# telnet hotstuff 24007
thanks,
krish
More information about the Gluster-devel
mailing list