[Bugs] [Bug 1529084] New: fstat returns ENOENT/ESTALE

bugzilla at redhat.com bugzilla at redhat.com
Tue Dec 26 09:54:06 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1529084

            Bug ID: 1529084
           Summary: fstat returns ENOENT/ESTALE
           Product: GlusterFS
           Version: 3.13
         Component: fuse
          Assignee: bugs at gluster.org
          Reporter: rgowdapp at redhat.com
                CC: bugs at gluster.org
        Depends On: 1510401
            Blocks: 1492591



+++ This bug was initially created as a clone of Bug #1510401 +++

Description of problem:
In a multithreaded application if following test is repeated one might end up a
failed fstat with ENOENT/ESTALE error:

* Turn off performance.open-behind.
* Thread t1 opens an fd on file 
* file is unlinked or overwritten by a rename - say rename (newfile, file)
* Thread t2 does fstat (fd)

Version-Release number of selected component (if applicable):
Day 0 bug, present in all versions.

How reproducible:
Race but reproducible fairly consistently

Steps to Reproduce:
1. Turn off performance.open-behind
2. Thread t1 opens an fd on file
3. file is unlinked or overwritten by a rename - say rename (newfile, file)
4. Thread t2 does fstat (fd)

Actual results:
sometimes fstat fails with ENOENT/ESTALE

Expected results:
fstat should never fail as there is an fd opened on the file.

Additional info:

--- Additional comment from Raghavendra G on 2017-11-07 05:33:28 EST ---

[raghu at unused tmp]$ grep GETATTR estale-ops.txt 
GETATTR {Len:56 Opcode:3 Unique:9895 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {GetattrFlags:0 Dummy:0 Fh:0} 


[raghu at unused tmp]$ grep Pid:18382 ./fuse_30.txt | grep Nodeid:140295787370744
GETATTR {Len:56 Opcode:3 Unique:9895 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {GetattrFlags:0 Dummy:0 Fh:0} 
FLUSH {Len:64 Opcode:25 Unique:9908 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {Fh:140296113765492 Unused:0 Padding:0
LockOwner:15059145825253368282} 
GETATTR {Len:56 Opcode:3 Unique:9910 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {GetattrFlags:0 Dummy:0 Fh:0} 
FLUSH {Len:64 Opcode:25 Unique:9917 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {Fh:140296113764232 Unused:0 Padding:0
LockOwner:15059145825253368282} 
OPEN {Len:48 Opcode:14 Unique:13150 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {Flags:32768 Unused:0} 
FLUSH {Len:64 Opcode:25 Unique:13156 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {Fh:140296113765912 Unused:0 Padding:0
LockOwner:15059145825253368282} 
OPEN {Len:48 Opcode:14 Unique:13193 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {Flags:32768 Unused:0} 
FLUSH {Len:64 Opcode:25 Unique:13202 Nodeid:140295787370744 Uid:601 Gid:601
Pid:18382 Padding:0} {Fh:140296113762832 Unused:0 Padding:0
LockOwner:15059145825253368282}

As we can see above before GETATTR failure (Unique:9895), an open was never
done on the same file (Nodeid:140295787370744) by the same application thread
(Pid:18382). Since no fd was opened by the same thread, glusterfs falls back to
STAT instead of FSTAT which can fail with ESTALE due to rename overwriting the
file. However, I do see that fds are opened by other threads on the same file
at the time of GETATTR (Unique:9895). So, I am wondering whether the
application is written in such a way that fds are opened and consumed across
different threads.  If yes, this could the cause for fstat failures as
Glusterfs uses an fd only if it is opened by same thread during GETATTR.
Otherwise, it'll use STAT instead of FSTAT.

--- Additional comment from Worker Ant on 2017-11-07 05:59:44 EST ---

REVIEW: https://review.gluster.org/18681 (mount/fuse: use fstat in getattr
implementation if any opened fd is available) posted (#1) for review on master
by Raghavendra G

--- Additional comment from Worker Ant on 2017-11-09 12:26:57 EST ---

COMMIT: https://review.gluster.org/18681 committed in master by  

------------- mount/fuse: use fstat in getattr implementation if any opened fd
is available

The restriction of using fds opened by the same Pid means fds cannot
be shared across threads of multithreaded application. Note that fops
from kernel have different Pid for different threads. Imagine
following sequence of operations:

* Turn off performance.open-behind
* Thread t1 opens an fd - fd1 - on file "file". Let's assume nodeid of
  "file" is "nodeid-file".
* Thread t2 does RENAME ("newfile", "file"). Let's assume nodeid of
  "newfile" as "nodeid-newfile".
* t2 proceeds to do fstat (fd1)

The above set of operations can sometimes result in ESTALE/ENOENT
errors. RENAME overwrites "file" with "newfile" changing its nodeid
from "nodeid-file" to "nodeid-newfile" and post RENAME, "nodeid-file" is
removed from the backend. If fstat carries nodeid-file as argument,
which can happen if lookup has not refreshed the nodeid of "file" and
since t2 doesn't have an fd opened, fuse_getattr_resume uses STAT
which will fail as "nodeid-file" no longer exists.

Since the above set of operations and sharing of fds across
multiple threads are valid, this is a bug.

The fix is to use any fd opened on the inode. In this specific example
fuse_getattr_resume will find fd1 and winds down the call as fstat
(fd1) which won't fail.

Cross-checked with "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for
any security issues with this solution and he approves the solution.

Thanks to "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for all the
pointers and discussions.

Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c
BUG: 1510401
Signed-off-by: Raghavendra G <rgowdapp at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1492591
[Bug 1492591] [GSS] Error No such file or directory for new file writes
https://bugzilla.redhat.com/show_bug.cgi?id=1510401
[Bug 1510401] fstat returns ENOENT/ESTALE
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list