[Gluster-devel] Stale NFS file handle, then EINVAL

Emmanuel Dreyfus manu at netbsd.org
Fri Jul 22 01:55:05 UTC 2011


Pavan T C <tcp at gluster.com> wrote:

> When lookup is sent, the details of the ondisk file (particularly gfid) 
> is fetched. In the lookup callback, a comparison with this gfid is made 
> with one in the local inode cache. If this inode is found in the local 
> cache and the local gfid differs from the one obtained from the disk, 
> ESTALE is returned to FUSE. FUSE will consider this to be a revalidate 
> case, and should send a revalidate lookup to update it's cache, and 
> should *not* pass this error back to VFS. If the error is passed back to 
> VFS, the application can report "Stale NFS file handle".

The problem here is not with LOOKUP, but with READDIR. I though that there was
some race condition and the offending files were changing but this not the
case: it always happens on the same files, but I have no bug if they are find
by LOOKUP. Only READDIR can trigger the problem.

Discover Makefile.am by READDIR gets ESTALE:

client# umount /gfs && mount /gfs
client# cd /gfs/usr/src/gnu/dist/binutils/bfd/ && ls -l
ls: ChangeLog-9193: Stale NFS file handle
ls: ChangeLog-9697: Stale NFS file handle
ls: Makefile.am: Stale NFS file handle
(...)
-rw-r--r--  1 root  wheel   18009 Nov 26  2003 COPYING
drwxr-xr-x  2 root  wheel    1024 Nov  6  2010 CVS
-rw-r--r--  1 root  wheel  256777 Feb  2  2006 ChangeLog
-rw-r--r--  1 root  wheel  350400 Nov 26  2003 ChangeLog-0001
-rw-r--r--  1 root  wheel  442601 Dec  8  2004 ChangeLog-0203
-rw-r--r--  1 root  wheel  411353 Nov 26  2003 ChangeLog-9495
-rw-r--r--  1 root  wheel  206494 Nov 26  2003 ChangeLog-9899
(...)

Discovering Makefile.am by LOOKUP is fine:

client# umount /gfs && mount /gfs
client# cd /gfs/usr/src/gnu/dist/binutils/bfd/ && ls -l Makefile.am
-rw-r--r--  1 root  wheel  66102 Feb  2  2006 Makefile.am


On the backend, I can tell the difference between files that trigger the bug
and the others: The bad ones have gfid out of sync with their linkto
counterpart on the other replica. The difference never heals. Here is a file
that triggers a ESTALE when I discover it through READDIR:

server# ls -l /export/*/usr/src/gnu/dist/binutils/bfd/Makefile.am
---------T  1 root  wheel      0 Jul 22 03:33
     /export/wd1a/usr/src/gnu/dist/binutils/bfd/Makefile.am
-rw-r--r--  1 root  wheel  66102 Feb  2  2006
     /export/wd3a/usr/src/gnu/dist/binutils/bfd/Makefile.am

server# getextattr -x trusted.gfid  \
    /export/*/usr/src/gnu/dist/binutils/bfd/Makefile.am
/export/wd1a/usr/src/gnu/dist/binutils/bfd/Makefile.am  
   000   3a 4c 41 78 86 59 4f ab 97 65 d5 a6 f8 c5 5f 4d    :LAx.YO..e...._M
/export/wd3a/usr/src/gnu/dist/binutils/bfd/Makefile.am  
   000   61 37 c5 59 90 83 42 88 a7 b8 ee 86 58 66 1f 13    a7.Y..B.....Xf..

And here is a file that has no problem:

server# ls -l /export/*/usr/src/gnu/dist/binutils/bfd/COPYING
---------T  1 root  wheell  0 Jul 19 09:45
    /export/wd1a/usr/src/gnu/dist/binutils/bfd/COPYING
-rw-r--r--  1 root  wheel  18009 Nov 26  2003 
    /export/wd3a/usr/src/gnu/dist/binutils/bfd/COPYING

server# getextattr -x trusted.gfid  \
    /export/*/usr/src/gnu/dist/binutils/bfd/COPYING
   000   0b fb b5 6a 7d c0 4d 22 98 f8 9d 6c 64 5b ab b7    ...j}.M"...ld[..
/export/wd3a/usr/src/gnu/dist/binutils/bfd/COPYING      
   000   0b fb b5 6a 7d c0 4d 22 98 f8 9d 6c 64 5b ab b7    ...j}.M"...ld[..


As I understand, on READDIR the glusterfs server reports the gfid of the
shadow linkto file to the client, and subsequent file usage will report the
correct gfid, leading to the mismatch.

You suggest that the FUSE implementation should filter out ESTALE by inssuing
another LOOKUP? I can implement this, but I have trouble to understand why you
have a bug report on this problem if Linux FUSE does that.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org




More information about the Gluster-devel mailing list