[Gluster-devel] Spurious disconnect in 3.4.0alpha

Fri Mar 1 05:38:16 UTC 2013

On 03/01/2013 10:34 AM, Joe Julian wrote:
> 0-gfs33-client-2 would be the third brick in the gfs33 volume, so 
> should be glusterfsd rather than glusterd, so not port 24007.
1) Client xlators first connect to glusterd on the remote-host, supplied 
in their options.
2) Query for the brick process' port (identified by brick's path).
3) Reconfigure the rpc object to connect to the brick process on the 
remote-host using the port received.
     This is when the client xlator connects to the glusterfsd (or the 
brick process) on the remote-host.


thanks,
krish
>
> krish <kparthas at redhat.com> wrote:
>
>     Hi Emmanuel,
>
>     On 03/01/2013 07:55 AM, Emmanuel Dreyfus wrote:
>
>         Hi The spurious disconnect I encountered in 3.4 branch still
>         happen in 3.4.0alpha, but glusterfs recovers much better now.
>         However, when running huge tar -xzf I still hit operation
>         failures, after which everything is restored to normal state.
>         Here is the client log, in which the issue is hit at 18:06:36
>         http://ftp.espci.fr/shadow/manu/client.log The relevant part
>         is below. I understand glusterfs is able to restore its
>         connections and everything works fine, except when it happens
>         on all volumes simultaneously. [2013-02-28 18:06:36.105271] W
>         [socket.c:1962:__socket_proto_state_machine] 0-gfs33-client-3:
>         reading from socket failed. Error (No message available), peer
>         (192.0.2.98:49153) [2013-02-28 18:06:36.105340] E
>         [rpc-clnt.c:368:saved_frames_unwind] 0-gfs33-client-3: forced
>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>         2013-02-28 18:06:36.104358 (xid=0x3728220x) [2013-02-28
>         18:06:36.105454] W
>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>         0-gfs33-client-3: remote operation failed: Socket is not
>         connected. Path: /manu/netbsd/usr/src/external
>         (6fb65713-062a-464d-a9d4-e97dab3c298b) [2013-02-28
>         18:06:36.105514] E [rpc-clnt.c:368:saved_frames_unwind]
>         0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
>         op(RELEASE(41)) called at 2013-02-28 18:06:36.104843
>         (xid=0x3728221x) [2013-02-28 18:06:36.105537] I
>         [client.c:2097:client_rpc_notify] 0-gfs33-client-3:
>         disconnected [2013-02-28 18:06:36.105571] E
>         [afr-common.c:3761:afr_notify] 0-gfs33-replicate-1: All
>         subvolumes are down. Going offline until atleast one of them
>         comes back up.[2013-02-28 18:06:36.112037] I
>         [afr-common.c:3882:afr_local_init] 0-gfs33-replicate-1: no
>         subvolumes up 
>
>     I see that 0-gfs33-client-2 xlator is unable to connect to glusterd
>     (that should be) running
>     on hotstuff:24007. The client xlator attempts to reconnect every 3s
>     since last attempt.
>     This is why we see logs about client disconnection repeat.
>
>     Could you check if glusterd was running on the host "hotstuff", when the
>     client
>     experiences spurious disconnects?
>     To confirm this when you notice the 'spurious' disconnects, try
>     # telnet hotstuff 24007
>
>     thanks,
>     krish
>
>     ------------------------------------------------------------------------
>
>     Gluster-devel mailing list
>     Gluster-devel at nongnu.org
>     https://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130301/26f84253/attachment-0001.html>