[Gluster-users] fractured/split glusterfs - 2 up, 2 down for an hour

Vijay Bellur vbellur at redhat.com
Sat Jan 4 17:15:29 UTC 2014


On 01/04/2014 07:21 AM, harry mangalam wrote:
> This is a distributed-only glusterfs on 4 servers with 2 bricks each on
> an IPoIB network.
>
> Thanks to a misconfigured autoupdate script, when 3.4.2 was released
> today, my gluster servers tried to update themselves. 2 succeeded, but
> then failed to restart, the other 2 failed to update and kept running.
>
> Not realizing the sequence of events, I restarted the 2 that failed to
> restart, which gave my fs 2 servers running 3.4.1 and 2 running 3.4.2.
>
> When I realized this after about 30m, I shut everything down and updated
> the 2 remaining to 3.4.2 and then restarted but now I'm getting lots of
> reports of file errors of the type 'endpoints not connected' and the like:
>
> [2014-01-04 01:31:18.593547] W
> [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote
> operation failed: Transport endpoint i
>
> s not connected. Path: /bio/fishm/test_cuffdiff.sh
> (00000000-0000-0000-0000-000000000000)
>
> [2014-01-04 01:31:18.594928] W
> [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote
> operation failed: Transport endpoint i
>
> s not connected. Path: /bio/fishm/test_cuffdiff.sh
> (00000000-0000-0000-0000-000000000000)
>
> [2014-01-04 01:31:18.595818] W
> [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote
> operation failed: Transport endpoint i
>
> s not connected. Path: /bio/fishm/.#test_cuffdiff.sh
> (14c3b612-e952-4aec-ae18-7f3dbb422dcc)
>
> [2014-01-04 01:31:18.597381] W
> [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote
> operation failed: Transport endpoint i
>
> s not connected. Path: /bio/fishm/test_cuffdiff.sh
> (00000000-0000-0000-0000-000000000000)
>
> [2014-01-04 01:31:18.598212] W
> [client-rpc-fops.c:814:client3_3_statfs_cbk] 0-gl-client-2: remote
> operation failed: Transport endpoint is
>
> not connected
>
> [2014-01-04 01:31:18.598236] W [dht-diskusage.c:45:dht_du_info_cbk]
> 0-gl-dht: failed to get disk info from gl-client-2
>
> [2014-01-04 01:31:19.912210] W [socket.c:514:__socket_rwv]
> 0-gl-client-2: readv failed (No data available)
>
> [2014-01-04 01:31:22.912717] W [socket.c:514:__socket_rwv]
> 0-gl-client-2: readv failed (No data available)
>
> [2014-01-04 01:31:25.913208] W [socket.c:514:__socket_rwv]
> 0-gl-client-2: readv failed (No data available)
>
> The servers at the same time provided the following error 'E' messages:
>
> Fri Jan 03 17:46:42 [0.20 0.12 0.13] root at biostor1:~
>
> 1008 $ grep ' E ' /var/log/glusterfs/bricks/raid1.log |grep '2014-01-03'
>
> [2014-01-03 06:11:36.251786] E [server-helpers.c:751:server_alloc_frame]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103) [0x3161e090d3]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x245)
> [0x3161e08f85]
> (-->/usr/lib64/glusterfs/3.4.1/xlator/protocol/server.so(server3_3_lookup+0xa0)
> [0x7fa60e577170]))) 0-server: invalid argument: conn
>
> [2014-01-03 06:11:36.251813] E
> [rpcsvc.c:450:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> to complete successfully
>
> [2014-01-03 17:48:44.236127] E [rpc-transport.c:253:rpc_transport_load]
> 0-rpc-transport: /usr/lib64/glusterfs/3.4.1/rpc-transport/rdma.so:
> cannot open shared object file: No such file or directory
>
> [2014-01-03 19:15:26.643378] E [rpc-transport.c:253:rpc_transport_load]
> 0-rpc-transport: /usr/lib64/glusterfs/3.4.2/rpc-transport/rdma.so:
> cannot open shared object file: No such file or directory
>

rdma.so seems to be missing here. Is glusterfs-rdma-3.4.2-1 rpm 
installed on the servers?

-Vijay




More information about the Gluster-users mailing list