[Gluster-users] fractured/split glusterfs - 2 up, 2 down for an hour
harry mangalam
harry.mangalam at uci.edu
Sat Jan 4 01:51:10 UTC 2014
This is a distributed-only glusterfs on 4 servers with 2 bricks each on an
IPoIB network.
Thanks to a misconfigured autoupdate script, when 3.4.2 was released today, my
gluster servers tried to update themselves. 2 succeeded, but then failed to
restart, the other 2 failed to update and kept running.
Not realizing the sequence of events, I restarted the 2 that failed to
restart, which gave my fs 2 servers running 3.4.1 and 2 running 3.4.2.
When I realized this after about 30m, I shut everything down and updated the 2
remaining to 3.4.2 and then restarted but now I'm getting lots of reports of
file errors of the type 'endpoints not connected' and the like:
[2014-01-04 01:31:18.593547] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-gl-client-2: remote operation failed: Transport endpoint i
s not connected. Path: /bio/fishm/test_cuffdiff.sh
(00000000-0000-0000-0000-000000000000)
[2014-01-04 01:31:18.594928] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-gl-client-2: remote operation failed: Transport endpoint i
s not connected. Path: /bio/fishm/test_cuffdiff.sh
(00000000-0000-0000-0000-000000000000)
[2014-01-04 01:31:18.595818] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-gl-client-2: remote operation failed: Transport endpoint i
s not connected. Path: /bio/fishm/.#test_cuffdiff.sh (14c3b612-e952-4aec-
ae18-7f3dbb422dcc)
[2014-01-04 01:31:18.597381] W [client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-gl-client-2: remote operation failed: Transport endpoint i
s not connected. Path: /bio/fishm/test_cuffdiff.sh
(00000000-0000-0000-0000-000000000000)
[2014-01-04 01:31:18.598212] W [client-rpc-fops.c:814:client3_3_statfs_cbk] 0-
gl-client-2: remote operation failed: Transport endpoint is
not connected
[2014-01-04 01:31:18.598236] W [dht-diskusage.c:45:dht_du_info_cbk] 0-gl-dht:
failed to get disk info from gl-client-2
[2014-01-04 01:31:19.912210] W [socket.c:514:__socket_rwv] 0-gl-client-2:
readv failed (No data available)
[2014-01-04 01:31:22.912717] W [socket.c:514:__socket_rwv] 0-gl-client-2:
readv failed (No data available)
[2014-01-04 01:31:25.913208] W [socket.c:514:__socket_rwv] 0-gl-client-2:
readv failed (No data available)
The servers at the same time provided the following error 'E' messages:
Fri Jan 03 17:46:42 [0.20 0.12 0.13] root at biostor1:~
1008 $ grep ' E ' /var/log/glusterfs/bricks/raid1.log |grep '2014-01-03'
[2014-01-03 06:11:36.251786] E [server-helpers.c:751:server_alloc_frame] (--
>/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103) [0x3161e090d3] (--
>/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x245) [0x3161e08f85] (--
>/usr/lib64/glusterfs/3.4.1/xlator/protocol/server.so(server3_3_lookup+0xa0)
[0x7fa60e577170]))) 0-server: invalid argument: conn
[2014-01-03 06:11:36.251813] E [rpcsvc.c:450:rpcsvc_check_and_reply_error] 0-
rpcsvc: rpc actor failed to complete successfully
[2014-01-03 17:48:44.236127] E [rpc-transport.c:253:rpc_transport_load] 0-rpc-
transport: /usr/lib64/glusterfs/3.4.1/rpc-transport/rdma.so: cannot open
shared object file: No such file or directory
[2014-01-03 19:15:26.643378] E [rpc-transport.c:253:rpc_transport_load] 0-rpc-
transport: /usr/lib64/glusterfs/3.4.2/rpc-transport/rdma.so: cannot open
shared object file: No such file or directory
The missing/misbehaving files /are/ accessible on the individual bricks but
not thru gluster.
This is a distributed-only setup, not replicated, so it seems like the
gluster volume heal <volume>
is appropriate.
Do the gluster wizards agree?
---
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140103/3425177e/attachment.html>
More information about the Gluster-users
mailing list