[Gluster-devel] brick half offline
Emmanuel Dreyfus
manu at netbsd.org
Sun Jul 29 04:37:48 UTC 2012
Hi
I hit another rare problem, which seems replicable within 4 hours of usage:
one brick goes down, but not completely. It will not create a file, for
instance, but it will participate in file locking and cause it to fail,
because it did not create the file.
Here is the final symptom (this create the file then locks it)
client# echo "xxx"|cat -l > /gfs/foo
cat: stdout: No such file or directory
birck1# ls -l /export/gfs1/foo
-rw-r--r-- 2 root wheel 0 Jul 29 06:18 /export/gfs1/foo
brick2# ls -l /export/gfs1/foo
ls: /export/gfs1/foo: No such file or directory
client log for this operation:
[2012-07-29 06:18:10.430637] W [client3_1-fops.c:2186:client3_1_lk_cbk]
0-gfs-client-1: remote operation failed: No such file or directory
[2012-07-29 06:18:10.431628] W [fuse-bridge.c:3196:fuse_setlk_cbk]
0-glusterfs-fuse: 11781877: ERR => -1 (No such file or directory)
[2012-07-29 06:18:10.434844] W [client3_1-fops.c:2186:client3_1_lk_cbk]
0-gfs-client-1: remote operation failed: No such file or directory
[2012-07-29 06:18:10.435939] W [fuse-bridge.c:3196:fuse_setlk_cbk]
0-glusterfs-fuse: 11781880: ERR => -1 (No such file or directory)
brick1 logs nothing.
brick2 log for this operation:
[2012-07-29 06:18:10.430151] I [server3_1-fops.c:203:server_lk_cbk]
0-gfs-server: 2017229: LK -2 (--) ==> -1 (No such file or directory)
[2012-07-29 06:18:10.434281] I [server3_1-fops.c:203:server_lk_cbk]
0-gfs-server: 2017231: LK -2 (--) ==> -1 (No such file or directory)
But this is only the conseuence of an earlier problem, where brick2
went half-offline. Enough to refuse creating files, not not enough to
be excluded from locking operation. Here is how it happened:
brick2 log
[2012-07-28 22:30:08.024578] E [event.c:346:event_dispatch_poll_handler]
0-poll: index not found for fd=15 (idx_hint=6)
[2012-07-28 22:30:18.418768] I [server-handshake.c:571:server_setvolume]
0-gfs-server: accepted client from
client-18310-2012/07/27-03:03:28:140183437669610-gfs-client-1-0
(version: 3.3git)
client log
[2012-07-28 22:30:08.026975] W [socket.c:1512:__socket_proto_state_machine]
0-gfs-client-1: reading from socket failed. Error (Socket is not
connected), peer (192.0.2.98:24010)
[2012-07-28 22:30:08.027050] E [rpc-clnt.c:373:saved_frames_unwind]
0-gfs-client-1: forced unwinding frame type(GlusterFS 3.1)
op(WRITE(13)) called at 2012-07-28 22:30:08.026783 (xid=0x1990324x)
[2012-07-28 22:30:08.027224] W [client3_1-fops.c:821:client3_1_writev_cbk]
0-gfs-client-1: remote operation failed: Socket is not connected
[2012-07-28 22:30:08.027396] I [client.c:2090:client_rpc_notify]
0-gfs-client-1: disconnected
[2012-07-28 22:30:08.027553] W [client3_1-fops.c:4929:client3_1_fxattrop]
0-gfs-client-1: (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1.
EBADFD
[2012-07-28 22:30:08.030501] W [client3_1-fops.c:5306:client3_1_finodelk]
0-gfs-client-1: (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1.
EBADFD
[2012-07-28 22:30:18.419800] I
[client-handshake.c:1636:select_server_supported_programs] 0-gfs-client-1:
Using Program GlusterFS 3.3git, Num (1298437), Version (330)
[2012-07-28 22:30:18.420716] I [client-handshake.c:1433:client_setvolume_cbk]
0-gfs-client-1: Connected to 192.0.2.98:24010, attached to remote volume
'/export/gfs1'.
[2012-07-28 22:30:18.420768] I [client-handshake.c:1454:client_setvolume_cbk]
0-gfs-client-1: Server and Client lk-version numbers are same, no need
to reopen the fds
We are said by both client and server that reconnexion was done without
a hitch, but it seems glusterfs did not really recovered.
--
Emmanuel Dreyfus
manu at netbsd.org
More information about the Gluster-devel
mailing list