[Gluster-devel] READDIR bug in NFS server (wasl: mount.t oddity)

Fri Aug 15 15:57:32 UTC 2014

On Fri, Aug 15, 2014 at 04:32:56PM +0200, Emmanuel Dreyfus wrote:
> Emmanuel Dreyfus <manu at netbsd.org> wrote:
> 
> > Fixing this is not straightforward. The eof field is set in the NFS reply
> > frame by nfs3_fill_readdir3res() when op_errno is ENOENT. Here is below the
> > kind of backtrace to  nfs3_fill_readdir3res() I get when mounting the NFS
> > filesystem. Further debugging shows op_errno is always 0. Obviously there must
> > be a op_errno = ENOENT missing somewhere in caller functions, but I have
> > trouble to tell where. I do not see anything going to the posix xlator as I
> > would have expected.
> 
> But I was a bit confused, as the request must go to bricks from NFS server
> which act as a gluster client. In bricks the posix xlator is involded. It
> ineed sets errno = ENOENT in posix_fill_readdir() when reaching the end of
> directory.
> 
> The backtrace leading to posix_fill_readdir() is below. The next question is
> once errno is set within an IO thread, how is it transmitted to the glusterfs
> server part so that it has a chance to be seen by the client?

I've just checked xlators/nfs/server/src/nfs3.c a little, and it seems 
that at least nfs3svc_readdir_fstat_cbk() tries to handle it:

4093         /* Check whether we encountered a end of directory stream while
4094          * readdir'ing.
4095          */
4096         if (cs->operrno == ENOENT) {
4097                 gf_log (GF_NFS3, GF_LOG_TRACE, "Reached end-of-directory");
4098                 is_eof = 1;
4099         }

is_eof is later on passed to nfs_readdir_reply() or nfs_readdirp_reply():

4111                 nfs3_readdir_reply (cs->req, stat, &cs->parent,
4112                                     (uintptr_t)cs->fd, buf, &cs->entries,
4113                                     cs->dircount, is_eof);
....
4118                 nfs3_readdirp_reply (cs->req, stat, &cs->parent,
4119                                      (uintptr_t)cs->fd, buf,
4120                                      &cs->entries, cs->dircount,
4121                                      cs->maxcount, is_eof);

There are other callers of nfs3_readdir{,p}_reply() that do not pass a
conditional is_eof. Fixing these callers looks like a good place to 
start. I don't have time to look into this today or over the weekend, 
but I can plan to check it next week.

In any case, do file a bug for it (and add me on CC) so that I won't 
forget to follow up.

Thanks,
Niels

> 
> 0xb9bef7ad <posix_fill_readdir+1369> at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
> 0xb9befe9d <posix_do_readdir+703> at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
> 0xb9bf0300 <posix_readdirp+618> at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/storage/posix.so
> 0xbb779b90 <default_readdirp+147> at /autobuild/install/lib/libglusterfs.so.0
> 0xbb30dc96 <posix_acl_readdirp+814> at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/features/access-control.so
> 0xb9bc7ca3 <pl_readdirp+815> at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/features/locks.so
> 0xbb77742a <default_readdirp_resume+518> at
> /autobuild/install/lib/libglusterfs.so.0
> 0xbb78f043 <fop_zerofill_stub+4271> at
> /autobuild/install/lib/libglusterfs.so.0
> 0xbb795cac <call_resume+175> at /autobuild/install/lib/libglusterfs.so.0
> 0xb9bb2e05 <iot_worker+554> at
> /autobuild/install/lib/glusterfs/3.7dev/xlator/performance/io-threads.so
> 0xbb705783 <pthread_setcancelstate+372> at /usr/lib/libpthread.so.1
> 0xbb491ee0 <_lwp_exit+0> at /usr/lib/libc.so.12
> 
> -- 
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> manu at netbsd.org
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel