[Gluster-devel] READDIR bug in NFS server (wasl: mount.t oddity)
Emmanuel Dreyfus
manu at netbsd.org
Fri Aug 15 13:11:17 UTC 2014
> Sorry for the missing subject, here it is.
>
> On Thu, Aug 14, 2014 at 02:10:16PM +0000, Emmanuel Dreyfus wrote:
> > I observe a strange thing with tests/basic/mount.t on NetBSD.
> > It hangs on
> > TEST 23 (line 66): ! rm /mnt/glusterfs/1/newfile
I came to the conclusion this is a bug in GlusterFS NFS server component.
Here the IP paccket for READDIR reply send by GlusterFS NFS server when the
only entry in the directory is a file called AAA (along with dot and dotdot):
0x0000: 4500 0110 2227 4000 4006 0000 17fd ac40 E..."'@. at ......@
0x0010: 17fd ac40 0801 03f4 1cf8 339a 1b0e 84e7 ... at ......3.....
0x0020: 8018 0100 897d 0000 0101 080a 0000 0002 .....}..........
0x0030: 0000 0001 8000 00d8 746a 6647 0000 0001 ........tjfG....
0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0050: 0000 0000 0000 0001 0000 0002 0000 01ed ................
0x0060: 0000 0003 0000 0000 0000 0064 0000 0000 ...........d....
0x0070: 0000 0400 0000 0000 0000 0800 0000 0000 ................
0x0080: 0000 0000 de01 7120 5cb3 7985 0000 0000 ......q.\.y.....
0x0090: 0000 0001 53ed 8b4d 0c78 b966 53ed 82e9 ....S..M.x.fS...
0x00a0: 29af 5df5 53ed 82e9 29af 5df5 f8e1 23bb ).].S...).]...#.
0x00b0: 0000 0000 0000 0001 0000 0000 0000 0001 ................
0x00c0: 0000 0001 2e00 0000 7fff ffff ee68 2415 .............h$.
0x00d0: 0000 0001 0000 0000 0000 0001 0000 0002 ................
0x00e0: 2e2e 0000 7fff ffff ee68 2419 0000 0001 .........h$.....
0x00f0: 8aab 132e eed7 a537 0000 0003 4141 4100 .......7....AAA.
0x0100: 7fff ffff ee68 2421 0000 0000 0000 0000 .....h$!........
Note the trailing nul byte. It is eof boolean flag, and it should be set to 1.
For some reason the Linux NFS client can cope with this error (I guess it uses
the packet length?), but the NetBSD NFS client keeps looping on the last
entry.
Fixing this is not straightforward. The eof field is set in the NFS reply
frame by nfs3_fill_readdir3res() when op_errno is ENOENT. Here is below the
kind of backtrace to nfs3_fill_readdir3res() I get when mounting the NFS
filesystem. Further debugging shows op_errno is always 0. Obviously there must
be a op_errno = ENOENT missing somewhere in caller functions, but I have
trouble to tell where. I do not see anything going to the posix xlator as I
would have expected.
0xb9ac364a <nfs3_fill_readdir3res+266> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xb9abc528 <nfs3_readdir_reply+155> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xb9abc758 <nfs3svc_readdir_fstat_cbk+505> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xb9a9ccb5 <nfs_fop_fstat_cbk+176> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/nfs/server.so
0xbb30e98e <io_stats_fstat_cbk+563> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/debug/io-stats.so
0xbb7708ac <default_fstat_cbk+314> at /autobuild/install/lib/libglusterfs.so.0
0xb9b2be69 <dht_attr_cbk+986> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/cluster/distribute.so
0xb9b57b52 <stripe_fstat_cbk+997> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/cluster/stripe.so
0xb9b8119e <afr_fstat_cbk+443> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/cluster/replicate.so
0xb9be00bf <client3_3_fstat_cbk+974> at
/autobuild/install/lib/glusterfs/3.7dev/xlator/protocol/client.so
0xbb73d45f <rpc_clnt_handle_reply+452> at /autobuild/install/lib/libgfrpc.so.0
0xbb73d757 <rpc_clnt_notify+560> at /autobuild/install/lib/libgfrpc.so.0
0xbb739c95 <rpc_transport_notify+153> at /autobuild/install/lib/libgfrpc.so.0
0xbb38c81b <_init+28395> at
/autobuild/install/lib/glusterfs/3.7dev/rpc-transport/socket.so
0xbb38ccd5 <_init+29605> at
/autobuild/install/lib/glusterfs/3.7dev/rpc-transport/socket.so
0xbb7c5cac <gf_client_dump_inodes+3832> at
/autobuild/install/lib/libglusterfs.so.0
0xbb7c5f00 <gf_client_dump_inodes+4428> at
/autobuild/install/lib/libglusterfs.so.0
0xbb798c5e <event_dispatch+121> at /autobuild/install/lib/libglusterfs.so.0
0x80515a8 <main+791> at /autobuild/install/sbin/glusterfs
0x804c505 <__start+309> at /autobuild/install/sbin/glusterfs
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org
More information about the Gluster-devel
mailing list