[Gluster-devel] Report ESTALE as ENOENT

Thu Feb 22 11:32:08 UTC 2018

On Thu, Feb 22, 2018 at 1:17 PM, Raghavendra G <raghavendra at gluster.com>
wrote:

>
>
> On Wed, Oct 11, 2017 at 7:32 PM, J. Bruce Fields <bfields at fieldses.org>
> wrote:
>
>> On Wed, Oct 11, 2017 at 04:11:51PM +0530, Raghavendra G wrote:
>> > On Thu, Mar 31, 2016 at 1:22 AM, J. Bruce Fields <bfields at fieldses.org>
>> > wrote:
>> >
>> > > On Mon, Mar 28, 2016 at 04:21:00PM -0400, Vijay Bellur wrote:
>> > > > I would prefer to:
>> > > >
>> > > > 1. Return ENOENT for all system calls that operate on a path.
>> > > >
>> > > > 2. ESTALE might be ok for file descriptor based operations.
>> > >
>> > > Note that operations which operate on paths can fail with ESTALE when
>> > > they attempt to look up a component within a directory that no longer
>> > > exists.
>> > >
>> >
>> > But, "man 2 rmdir"  or "man 2 unlink" doesn't list ESTALE as a valid
>> error.
>>
>> In fact, almost no man pages list ESTALE as a valid error:
>>
>>         [bfields at patate man-pages]$ git grep ESTALE
>>         Changes.old:        Change description for ESTALE
>>         man2/open_by_handle_at.2:.B ESTALE
>>         man2/open_by_handle_at.2:.B ESTALE
>>         man3/errno.3:.B ESTALE
>>
>> Cc'ing Michael Kerrisk for advice.  Is there some reason for that, or
>> can we fix those man pages?
>>
>> > Also rm doesn't seem to handle ESTALE too [3]
>> >
>> > [4] https://github.com/coreutils/coreutils/blob/master/src/remov
>> e.c#L305
>>
>> I *think* that code is just deciding whether a given error should be
>> silently ignored in the rm -f case.  I don't think -ESTALE (indicating
>> the directory is bad) is such an error, so I think this code is correct.
>> But my understanding may be wrong.
>>
>
> For a local filesystem, we may not end up in ESTALE errors. But, when
> rmdir is executed from multiple clients of a network fs (like NFS,
> Glusterfs), unlink or rmdir can easily fail with ESTALE as the other rm
> invocation could've deleted it. I think this is what has happened in bugs
> like:
> https://bugzilla.redhat.com/show_bug.cgi?id=1546717
> https://bugzilla.redhat.com/show_bug.cgi?id=1245065
>
> This in fact was the earlier motivation to convert ESTALE into ENOENT, so
> that rm would ignore it. Now that I reverted the fix, looks like the bug
> has promptly resurfaced :)
>
> There is one glitch though. Bug 1245065 mentions that some parts of
> directory structure remain undeleted. From my understanding, atleast one
> instance of rm (which is racing ahead of all others causing others to
> fail), should've delted the directory structure completely. Though, I need
> to understand the directory traversal done by rm to find whether there are
> cyclic dependency between two rms causing both of them to fail.
>

Also note that VFS retries unlink and rmdir if there is an estale:

https://github.com/torvalds/linux/blob/master/fs/namei.c#L4056
https://github.com/torvalds/linux/blob/master/fs/namei.c#L3927

So, underlying fs like Glusterfs cannot mask ESTALE as it'll break some
functionality.

In this scenario what are the ways to fix a failing rm when run from
multiple mount points (Bugs I pointed out above)? I really can't think of a
way other than fixing rm to ignore ESTALE error.

>
>> > > Maybe non-creating open("./foo") returning ENOENT would be reasonable
>> in
>> > > this case since that's what you'd get in the local filesystem case,
>> but
>> > > creat("./foo") returning ENOENT, for example, isn't something
>> > > applications will be written to handle.
>> > >
>> > > The Linux VFS will retry ESTALE on path-based systemcalls *one* time,
>> to
>> > > reduce the chance of ESTALE in those cases.
>> >
>> >
>> > I should've anticipated bug [2] due to this comment. My mistake. Bug
>> [2] is
>> > indeed due to kernel not retrying open on receiving an ENOENT error.
>> > Glusterfs sent ENOENT because file's inode-number/nodeid changed but
>> same
>> > path exists. The correct error would've been ESTALE, but due to our
>> > conversion of ESTALE to ENOENT, the latter was sent back to kernel.
>> >
>> > Looking through kernel VFS code, only open *seems* to retry
>> > (do_filep_open). I couldn't find similar logic to other path based
>> syscalls
>> > like rmdir, unlink, stat, chmod etc
>>
>> I believe there is a retry in those cases, but I'm not sure exactly
>> where it is.  Looking around.... See the retry_estale() checks sprinkled
>> around namei.c, which were added by Jeff Layton a few years ago.
>>
>> --b.
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180222/2f603dff/attachment.html>