[Gluster-users] node crashing on 4 replicated-distributed cluster

Giovanni Toraldo gt at libersoft.it
Fri Dec 24 15:44:50 UTC 2010


Hi,

I've got troubles after few minutes of glusterfs operations.

I setup a 4-node replica 4 storage, with 2 bricks on every server:
# gluster volume create vms replica 4 transport tcp
192.168.7.1:/srv/vol1 192.168.7.2:/srv/vol1 192.168.7.3:/srv/vol1
192.168.7.4:/srv/vol1 192.168.7.1:/srv/vol2 192.168.7.2:/srv/vol2
192.168.7.3:/srv/vol2 192.168.7.4:/srv/vol2

I started copying files with rsync from node1, and after few minutes the
network traffic stalled.

Inspecting logs brick logs on node4, I've found many of:

[2010-12-24 15:58:50.247688] C [rpcsvc.c:1118:rpcsvc_notify] rpcsvc: got
MAP_XID event, which should have not come
[2010-12-24 15:58:50.264731] E [rpcsvc.c:874:rpcsvc_request_create]
rpc-service: RPC call decoding failed
[2010-12-24 15:58:50.264835] I [server.c:428:server_rpc_notify]
vms-server: disconnected connection from 192.168.7.1:1001
[2010-12-24 15:58:50.279233] I [server-handshake.c:535:server_setvolume]
vms-server: accepted client from 192.168.7.1:1018
[2010-12-24 15:59:02.100081] E [rpcsvc.c:874:rpcsvc_request_create]
rpc-service: RPC call decoding failed
[2010-12-24 15:59:02.100160] I [server.c:428:server_rpc_notify]
vms-server: disconnected connection from 192.168.7.1:1018
[2010-12-24 15:59:02.181278] I [server-handshake.c:535:server_setvolume]
vms-server: accepted client from 192.168.7.1:1018

On nfs.log of node1 (many, operations changing):
[2010-12-24 15:58:49.263361] E [rpc-clnt.c:338:saved_frames_unwind]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x77) [0x7fabdcf5bd17]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)
[0x7fabdcf5b4ae] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7fabdcf5b40e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1)
op(WRITE(13)) called at 2010-12-24 15:58:49.150707

Have you some idea?

Thanks.

-- 
Giovanni Toraldo
http://www.libersoft.it/




More information about the Gluster-users mailing list