[Gluster-devel] Spurious disconnect in 3.4.0alpha

Joe Julian joe at julianfamily.org
Fri Mar 1 05:04:15 UTC 2013


0-gfs33-client-2 would be the third brick in the gfs33 volume, so should be glusterfsd rather than glusterd, so not port 24007.

krish <kparthas at redhat.com> wrote:

>Hi Emmanuel,
>
>On 03/01/2013 07:55 AM, Emmanuel Dreyfus wrote:
>> Hi
>>
>> The spurious disconnect I encountered in 3.4 branch still happen in
>> 3.4.0alpha, but glusterfs recovers much better now. However, when
>> running huge tar -xzf I still hit operation failures, after which
>> everything is restored to normal state.
>>
>> Here is the client log, in which the issue is hit at 18:06:36
>> http://ftp.espci.fr/shadow/manu/client.log
>>
>> The relevant part is below. I understand glusterfs is able to restore
>> its connections and everything works fine, except when it happens on
>all
>> volumes simultaneously.
>>
>> [2013-02-28 18:06:36.105271] W
>> [socket.c:1962:__socket_proto_state_machine] 0-gfs33-client-3:
>reading
>> from socket failed. Error (No message available), peer
>> (192.0.2.98:49153)
>> [2013-02-28 18:06:36.105340] E [rpc-clnt.c:368:saved_frames_unwind]
>> 0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
>> op(LOOKUP(27)) called at 2013-02-28 18:06:36.104358 (xid=0x3728220x)
>> [2013-02-28 18:06:36.105454] W
>> [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gfs33-client-3:
>remote
>> operation failed: Socket is not connected. Path:
>> /manu/netbsd/usr/src/external (6fb65713-062a-464d-a9d4-e97dab3c298b)
>> [2013-02-28 18:06:36.105514] E [rpc-clnt.c:368:saved_frames_unwind]
>> 0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
>> op(RELEASE(41)) called at 2013-02-28 18:06:36.104843 (xid=0x3728221x)
>> [2013-02-28 18:06:36.105537] I [client.c:2097:client_rpc_notify]
>> 0-gfs33-client-3: disconnected
>> [2013-02-28 18:06:36.105571] E [afr-common.c:3761:afr_notify]
>> 0-gfs33-replicate-1: All subvolumes are down. Going offline until
>> atleast one of them comes back up.[2013-02-28 18:06:36.112037] I
>> [afr-common.c:3882:afr_local_init] 0-gfs33-replicate-1: no subvolumes
>up
>I see that 0-gfs33-client-2 xlator is unable to connect to glusterd 
>(that should be) running
>on hotstuff:24007. The client xlator attempts to reconnect every 3s 
>since last attempt.
>This is why we see logs about client disconnection repeat.
>
>Could you check if glusterd was running on the host "hotstuff", when
>the 
>client
>experiences spurious disconnects?
>To confirm this when you notice the 'spurious' disconnects, try
># telnet hotstuff 24007
>
>thanks,
>krish
>
>_______________________________________________
>Gluster-devel mailing list
>Gluster-devel at nongnu.org
>https://lists.nongnu.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130228/0620df26/attachment-0001.html>


More information about the Gluster-devel mailing list