[Gluster-devel] brick half offline

Emmanuel Dreyfus manu at netbsd.org
Sun Jul 29 04:37:48 UTC 2012


Hi

I hit another rare problem, which seems replicable within 4 hours of usage:
one brick goes down, but not completely. It will not create a file, for
instance, but it will participate in file locking and cause it to fail, 
because it did not create the file.

Here is the final symptom (this create the file then locks it)
client# echo "xxx"|cat -l > /gfs/foo
cat: stdout: No such file or directory

birck1# ls -l /export/gfs1/foo
-rw-r--r--  2 root  wheel  0 Jul 29 06:18 /export/gfs1/foo

brick2# ls -l /export/gfs1/foo                                                 
ls: /export/gfs1/foo: No such file or directory

client log for this operation:
[2012-07-29 06:18:10.430637] W [client3_1-fops.c:2186:client3_1_lk_cbk] 
   0-gfs-client-1: remote operation failed: No such file or directory
[2012-07-29 06:18:10.431628] W [fuse-bridge.c:3196:fuse_setlk_cbk] 
   0-glusterfs-fuse: 11781877: ERR => -1 (No such file or directory)
[2012-07-29 06:18:10.434844] W [client3_1-fops.c:2186:client3_1_lk_cbk] 
   0-gfs-client-1: remote operation failed: No such file or directory
[2012-07-29 06:18:10.435939] W [fuse-bridge.c:3196:fuse_setlk_cbk] 
   0-glusterfs-fuse: 11781880: ERR => -1 (No such file or directory)

brick1 logs nothing.

brick2 log for this operation:
[2012-07-29 06:18:10.430151] I [server3_1-fops.c:203:server_lk_cbk] 
   0-gfs-server: 2017229: LK -2 (--) ==> -1 (No such file or directory)
[2012-07-29 06:18:10.434281] I [server3_1-fops.c:203:server_lk_cbk] 
   0-gfs-server: 2017231: LK -2 (--) ==> -1 (No such file or directory)

But this is only the conseuence of an earlier problem, where brick2
went half-offline. Enough to refuse creating files, not not enough to
be excluded from locking operation. Here is how it happened:

brick2 log
[2012-07-28 22:30:08.024578] E [event.c:346:event_dispatch_poll_handler] 
    0-poll: index not found for fd=15 (idx_hint=6)
[2012-07-28 22:30:18.418768] I [server-handshake.c:571:server_setvolume] 
    0-gfs-server: accepted client from 
    client-18310-2012/07/27-03:03:28:140183437669610-gfs-client-1-0 
    (version: 3.3git)

client log
[2012-07-28 22:30:08.026975] W [socket.c:1512:__socket_proto_state_machine] 
    0-gfs-client-1: reading from socket failed. Error (Socket is not 
    connected), peer (192.0.2.98:24010)
[2012-07-28 22:30:08.027050] E [rpc-clnt.c:373:saved_frames_unwind]  
    0-gfs-client-1: forced unwinding frame type(GlusterFS 3.1) 
    op(WRITE(13)) called at 2012-07-28 22:30:08.026783 (xid=0x1990324x)
[2012-07-28 22:30:08.027224] W [client3_1-fops.c:821:client3_1_writev_cbk] 
    0-gfs-client-1: remote operation failed: Socket is not connected
[2012-07-28 22:30:08.027396] I [client.c:2090:client_rpc_notify] 
    0-gfs-client-1: disconnected
[2012-07-28 22:30:08.027553] W [client3_1-fops.c:4929:client3_1_fxattrop]
    0-gfs-client-1:  (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1.
   EBADFD
[2012-07-28 22:30:08.030501] W [client3_1-fops.c:5306:client3_1_finodelk] 
    0-gfs-client-1:  (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1. 
    EBADFD
[2012-07-28 22:30:18.419800] I 
    [client-handshake.c:1636:select_server_supported_programs] 0-gfs-client-1: 
    Using Program GlusterFS 3.3git, Num (1298437), Version (330)
[2012-07-28 22:30:18.420716] I [client-handshake.c:1433:client_setvolume_cbk] 
    0-gfs-client-1: Connected to 192.0.2.98:24010, attached to remote volume 
    '/export/gfs1'.
[2012-07-28 22:30:18.420768] I [client-handshake.c:1454:client_setvolume_cbk] 
    0-gfs-client-1: Server and Client lk-version numbers are same, no need 
    to reopen the fds

We are said by both client and server that reconnexion was done without
a hitch, but it seems glusterfs did not really recovered. 

-- 
Emmanuel Dreyfus
manu at netbsd.org




More information about the Gluster-devel mailing list