[Gluster-devel] Spurious disconnect in 3.4.0alpha

Emmanuel Dreyfus manu at netbsd.org
Fri Mar 1 02:25:13 UTC 2013


The spurious disconnect I encountered in 3.4 branch still happen in
3.4.0alpha, but glusterfs recovers much better now. However, when
running huge tar -xzf I still hit operation failures, after which
everything is restored to normal state.

Here is the client log, in which the issue is hit at 18:06:36

The relevant part is below. I understand glusterfs is able to restore
its connections and everything works fine, except when it happens on all
volumes simultaneously.

[2013-02-28 18:06:36.105271] W
[socket.c:1962:__socket_proto_state_machine] 0-gfs33-client-3: reading
from socket failed. Error (No message available), peer
[2013-02-28 18:06:36.105340] E [rpc-clnt.c:368:saved_frames_unwind]
0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
op(LOOKUP(27)) called at 2013-02-28 18:06:36.104358 (xid=0x3728220x)
[2013-02-28 18:06:36.105454] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gfs33-client-3: remote
operation failed: Socket is not connected. Path:
/manu/netbsd/usr/src/external (6fb65713-062a-464d-a9d4-e97dab3c298b)
[2013-02-28 18:06:36.105514] E [rpc-clnt.c:368:saved_frames_unwind]
0-gfs33-client-3: forced unwinding frame type(GlusterFS 3.3)
op(RELEASE(41)) called at 2013-02-28 18:06:36.104843 (xid=0x3728221x)
[2013-02-28 18:06:36.105537] I [client.c:2097:client_rpc_notify]
0-gfs33-client-3: disconnected
[2013-02-28 18:06:36.105571] E [afr-common.c:3761:afr_notify]
0-gfs33-replicate-1: All subvolumes are down. Going offline until
atleast one of them comes back up.[2013-02-28 18:06:36.112037] I
[afr-common.c:3882:afr_local_init] 0-gfs33-replicate-1: no subvolumes up

Emmanuel Dreyfus
manu at netbsd.org

