[Gluster-devel] Spurious disconnect in 3.4.0alpha
kparthas at redhat.com
Fri Mar 1 05:38:16 UTC 2013
On 03/01/2013 10:34 AM, Joe Julian wrote:
> 0-gfs33-client-2 would be the third brick in the gfs33 volume, so
> should be glusterfsd rather than glusterd, so not port 24007.
1) Client xlators first connect to glusterd on the remote-host, supplied
in their options.
2) Query for the brick process' port (identified by brick's path).
3) Reconfigure the rpc object to connect to the brick process on the
remote-host using the port received.
This is when the client xlator connects to the glusterfsd (or the
brick process) on the remote-host.
> krish <kparthas at redhat.com> wrote:
> Hi Emmanuel,
> On 03/01/2013 07:55 AM, Emmanuel Dreyfus wrote:
> Hi The spurious disconnect I encountered in 3.4 branch still
> happen in 3.4.0alpha, but glusterfs recovers much better now.
> However, when running huge tar -xzf I still hit operation
> failures, after which everything is restored to normal state.
> Here is the client log, in which the issue is hit at 18:06:36
> http://ftp.espci.fr/shadow/manu/client.log The relevant part
> is below. I understand glusterfs is able to restore its
> connections and everything works fine, except when it happens
> on all volumes simultaneously. [2013-02-28 18:06:36.105271] W
> [socket.c:1962:__socket_proto_state_machine] 0-gfs33-client-3:
> reading from socket failed. Error (No message available), peer
> (192.0.2.98:49153) [2013-02-28 18:06:36.105340] E
> [rpc-clnt.c:368:saved_frames_unwind] 0-gfs33-client-3: forced
> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
> 2013-02-28 18:06:36.104358 (xid=0x3728220x) [2013-02-28
> 18:06:36.105454] W
> 0-gfs33-client-3: remote operation failed: Socket is not
> connected. Path: /manu/netbsd/usr/src/external
> (6fb65713-062a-464d-a9d4-e97dab3c298b) [2013-02-28
> 18:06:36.105514] E [rpc-clnt.c:368:saved_frames_unwind]
> 0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
> op(RELEASE(41)) called at 2013-02-28 18:06:36.104843
> (xid=0x3728221x) [2013-02-28 18:06:36.105537] I
> [client.c:2097:client_rpc_notify] 0-gfs33-client-3:
> disconnected [2013-02-28 18:06:36.105571] E
> [afr-common.c:3761:afr_notify] 0-gfs33-replicate-1: All
> subvolumes are down. Going offline until atleast one of them
> comes back up.[2013-02-28 18:06:36.112037] I
> [afr-common.c:3882:afr_local_init] 0-gfs33-replicate-1: no
> subvolumes up
> I see that 0-gfs33-client-2 xlator is unable to connect to glusterd
> (that should be) running
> on hotstuff:24007. The client xlator attempts to reconnect every 3s
> since last attempt.
> This is why we see logs about client disconnection repeat.
> Could you check if glusterd was running on the host "hotstuff", when the
> experiences spurious disconnects?
> To confirm this when you notice the 'spurious' disconnects, try
> # telnet hotstuff 24007
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel