[Gluster-devel] Latest unstable (1.4) branch checkout seems... unstable

Brent A Nelson brent at phys.ufl.edu
Tue Jul 29 18:35:13 UTC 2008


I did a few tla replay --reverse operations and found that patch level 258 
works fine (except for previously reported fchmod and acl issues). replay 
to 259, and it breaks as below.  The posix cleanup patch breaks in my 
setup.

Thanks,

Brent

On Tue, 29 Jul 2008, Brent A Nelson wrote:

> I had to make the ip->addr change a number of checkouts ago.  I hadn't yet 
> switch from tcp/client and tcp/server to socket, as backwards compatibility 
> seemed to work fine.  I just made the change, but as expected (since the 
> client is obviously communicating with all the servers; for example, df 
> information is correct), it didn't help.
>
> Other then this common complaint:
> 2008-07-29 12:04:18 C [dict.c:1141:data_to_str] dict: @data=(nil)
>
> I have nothing in the server logs.  However, I'm not sure how useful the 
> server logs are, as I run 4-5 server processes per machine, and they all use 
> the same log location.
>
> This setup (which is a set of four machines, 4 exports per machine, 2 
> machines offering namespace, clientside AFR+unify) was working fine with a 
> checkout that was probably about a week old.
>
> It's possible that it's due to some changes I made to the kernel of my build 
> machine to try to get shared writable mmap support into my fuse, but those 
> patches were pretty specific, and I wouldn't expect it to cause this kind of 
> behavior.
>
> I'll try to figure out how to get tla to roll back to a particular patchset 
> and see if I can identify which patch causes the breakage.
>
> Thanks,
>
> Brent
>
> On Tue, 29 Jul 2008, Raghavendra G wrote:
>
>> Hi Brent,
>> 
>> There are couple of changes in 1.4. The authentication module "ip" have 
>> been
>> renamed as "addr". so the server-volume-spec file should have,
>> 
>> auth.addr.<brick-name>.allow <list-of-addresses>
>> 
>> list-of-addresses depends on the address-family specified in the
>> transport/socket. it can be,
>> ip-address for inet/inet6/inet-sdp
>> path for unix
>> 
>> Do the server side logs say that "no authentication module is interested in
>> authenticating client xxxx"? If thats the case, the above fix works. If 
>> not,
>> can you send server side logs?
>> 
>> regards,
>> On Mon, Jul 28, 2008 at 11:23 PM, Brent A Nelson <brent at phys.ufl.edu> 
>> wrote:
>> 
>>> The latest checkout seems to have a major defect, in my setup.  On the
>>> bright side, the fchmod bug seems like it might be fixed (although it 
>>> could
>>> be that the filesystem isn't working well enough to tell)...
>>> 
>>> ls -al /beast
>>> ls: cannot access /beast/vz: No such file or directory
>>> ls: cannot access /beast/openvz: No such file or directory
>>> ls: cannot access /beast/usr0: No such file or directory
>>> ls: cannot access /beast/lost+found: No such file or directory
>>> ls: reading directory /beast: File descriptor in bad state
>>> total 128
>>> drwxr-xr-x  6 root root 20480 2008-07-28 15:02 .
>>> drwxrwxrwx 28 4791 kmem  4096 2008-07-18 20:32 ..
>>> d?????????  ? ?    ?        ?                ? lost+found
>>> -rwxr-xr-x  1 root root 92376 2008-04-04 02:42 ls
>>> d?????????  ? ?    ?        ?                ? openvz
>>> d?????????  ? ?    ?        ?                ? usr0
>>> d?????????  ? ?    ?        ?                ? vz
>>> 
>>> Associated glusterfs.log:
>>> 
>>> 2008-07-28 15:08:10 E [socket.c:1186:socket_submit] share4-0: transport 
>>> not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:08:10 E [afr.c:3428:afr_statfs_cbk] mirror4: 
>>> (child=share4-0)
>>> op_ret=-1 op_errno=107(Transport endpoint is not connected)
>>> 2008-07-28 15:08:59 C [client-protocol.c:223:call_bail] ns0-0: bailing
>>> transport2008-07-28 15:08:59 C [client-protocol.c:223:call_bail] ns0-1:
>>> bailing transport2008-07-28 15:08:59 E
>>> [client-protocol.c:4122:protocol_client_cleanup] ns0-0: forced unwinding
>>> frame type(1) op(34) reply=@0xb4b02790
>>> 2008-07-28 15:08:59 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-1: forced unwinding frame type(1) op(34) reply=@0xb4b025e0
>>> 2008-07-28 15:08:59 E [socket.c:1186:socket_submit] ns0-0: transport not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:08:59 E [socket.c:1186:socket_submit] ns0-1: transport not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:08:59 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 18: (op_num=34) / => -1 (Transport endpoint is not connected)
>>> 2008-07-28 15:09:49 C [client-protocol.c:223:call_bail] ns0-0: bailing
>>> transport2008-07-28 15:09:49 C [client-protocol.c:223:call_bail] ns0-1:
>>> bailing transport2008-07-28 15:09:49 E
>>> [client-protocol.c:4122:protocol_client_cleanup] ns0-0: forced unwinding
>>> frame type(2) op(0) reply=@0xb4b010d0
>>> 2008-07-28 15:09:49 E [dict.c:648:dict_unserialize] dict: sscanf on buf
>>> failed
>>> 2008-07-28 15:09:49 E [client-protocol.c:3980:client_setvolume_cbk] ns0-0:
>>> SETVOLUME on remote-host failed: ret=-2 error=Unknown Error
>>> 2008-07-28 15:09:49 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-0: forced unwinding frame type(1) op(34) reply=@0xb4b010d0
>>> 2008-07-28 15:09:49 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-1: forced unwinding frame type(2) op(0) reply=@0xb4b010d0
>>> 2008-07-28 15:09:49 E [dict.c:648:dict_unserialize] dict: sscanf on buf
>>> failed
>>> 2008-07-28 15:09:49 E [client-protocol.c:3980:client_setvolume_cbk] ns0-1:
>>> SETVOLUME on remote-host failed: ret=-2 error=Unknown Error
>>> 2008-07-28 15:09:49 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-1: forced unwinding frame type(1) op(34) reply=@0xb4b010d0
>>> 2008-07-28 15:09:49 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 18: (op_num=34) / => -1 (No such file or directory)
>>> 2008-07-28 15:09:49 E [socket.c:1186:socket_submit] ns0-0: transport not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:09:49 E [socket.c:1186:socket_submit] ns0-1: transport not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:09:49 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 19: (op_num=34) / => -1 (Transport endpoint is not connected)
>>> 2008-07-28 15:09:49 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 19: (op_num=34) / => -1 (No such file or directory)
>>> 2008-07-28 15:09:49 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 20: (op_num=34) / => -1 (Transport endpoint is not connected)
>>> 2008-07-28 15:09:49 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 20: (op_num=34) / => -1 (No such file or directory)
>>> 2008-07-28 15:09:49 E [afr.c:4180:afr_readdir_cbk] ns0: (child=ns0-1)
>>> op_ret=-1 op_errno=77(File descriptor in bad state)
>>> 2008-07-28 15:09:49 E [fuse-bridge.c:1947:fuse_readdir_cbk] 
>>> glusterfs-fuse:
>>> 21: READDIR => -1 (File descriptor in bad state)
>>> 2008-07-28 15:09:49 E [afr.c:5641:afr_closedir] ns0: child_errno[] not 0,
>>> returning ENOTCONN
>>> 2008-07-28 15:09:49 E [fuse-bridge.c:940:fuse_err_cbk] glusterfs-fuse: 22:
>>> (op_num=24) ERR => -1 (Transport endpoint is not connected)
>>> 2008-07-28 15:10:42 C [client-protocol.c:223:call_bail] ns0-0: bailing
>>> transport2008-07-28 15:10:42 E
>>> [client-protocol.c:4122:protocol_client_cleanup] ns0-0: forced unwinding
>>> frame type(2) op(0) reply=@0x80a9cc8
>>> 2008-07-28 15:10:42 E [dict.c:648:dict_unserialize] dict: sscanf on buf
>>> failed
>>> 2008-07-28 15:10:42 E [client-protocol.c:3980:client_setvolume_cbk] ns0-0:
>>> SETVOLUME on remote-host failed: ret=-2 error=Unknown Error
>>> 2008-07-28 15:10:42 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-0: forced unwinding frame type(1) op(34) reply=@0x80a9cc8
>>> 2008-07-28 15:10:42 C [client-protocol.c:223:call_bail] ns0-1: bailing
>>> transport2008-07-28 15:10:42 E
>>> [client-protocol.c:4122:protocol_client_cleanup] ns0-1: forced unwinding
>>> frame type(2) op(0) reply=@0x80a5b48
>>> 2008-07-28 15:10:42 E [dict.c:648:dict_unserialize] dict: sscanf on buf
>>> failed
>>> 2008-07-28 15:10:42 E [client-protocol.c:3980:client_setvolume_cbk] ns0-1:
>>> SETVOLUME on remote-host failed: ret=-2 error=Unknown Error
>>> 2008-07-28 15:10:42 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-1: forced unwinding frame type(1) op(34) reply=@0x80a5b48
>>> 2008-07-28 15:10:42 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 23: (op_num=34) / => -1 (Transport endpoint is not connected)
>>> 2008-07-28 15:10:42 E [socket.c:1186:socket_submit] ns0-0: transport not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:10:42 E [socket.c:1186:socket_submit] ns0-1: transport not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:10:42 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 23: (op_num=34) / => -1 (No such file or directory)
>>> 2008-07-28 15:11:32 C [client-protocol.c:223:call_bail] ns0-1: bailing
>>> transport2008-07-28 15:11:32 E
>>> [client-protocol.c:4122:protocol_client_cleanup] ns0-1: forced unwinding
>>> frame type(2) op(0) reply=@0x80ab308
>>> 2008-07-28 15:11:32 E [dict.c:648:dict_unserialize] dict: sscanf on buf
>>> failed
>>> 2008-07-28 15:11:32 E [client-protocol.c:3980:client_setvolume_cbk] ns0-1:
>>> SETVOLUME on remote-host failed: ret=-2 error=Unknown Error
>>> 2008-07-28 15:11:32 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-1: forced unwinding frame type(1) op(34) reply=@0x80ab308
>>> 2008-07-28 15:11:32 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 24: (op_num=34) / => -1 (Transport endpoint is not connected)
>>> 2008-07-28 15:11:32 E [socket.c:1186:socket_submit] ns0-1: transport not
>>> connected to submit (priv->connected = 0)
>>> 2008-07-28 15:11:35 C [client-protocol.c:223:call_bail] ns0-0: bailing
>>> transport2008-07-28 15:11:35 E
>>> [client-protocol.c:4122:protocol_client_cleanup] ns0-0: forced unwinding
>>> frame type(2) op(0) reply=@0x80a9cc8
>>> 2008-07-28 15:11:35 E [dict.c:648:dict_unserialize] dict: sscanf on buf
>>> failed
>>> 2008-07-28 15:11:35 E [client-protocol.c:3980:client_setvolume_cbk] ns0-0:
>>> SETVOLUME on remote-host failed: ret=-2 error=Unknown Error
>>> 2008-07-28 15:11:35 E [client-protocol.c:4122:protocol_client_cleanup]
>>> ns0-0: forced unwinding frame type(1) op(34) reply=@0x80a9cc8
>>> 2008-07-28 15:11:35 E [fuse-bridge.c:452:fuse_entry_cbk] glusterfs-fuse:
>>> 24: (op_num=34) / => -1 (No such file or directory)
>>> 
>>> Also, trying to shut down after this test, the filesystem unmounts fine,
>>> and most of the share glusterfsd processes were killed normally, but I had
>>> to kill -9 the namespace glusterfsd processes.
>>> 
>>> Thanks,
>>> 
>>> Brent
>>> 
>>> 
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>> 
>> 
>> 
>> 
>> -- 
>> Raghavendra G
>> 
>> A centipede was happy quite, until a toad in fun,
>> Said, "Prey, which leg comes after which?",
>> This raised his doubts to such a pitch,
>> He fell flat into the ditch,
>> Not knowing how to run.
>> -Anonymous
>> 
>





More information about the Gluster-devel mailing list