[Gluster-devel] Report ESTALE as ENOENT

Wed Oct 18 06:39:37 UTC 2017

Adding in Eric and Steve.

Ric

On Oct 18, 2017 9:35 AM, "Raghavendra G" <raghavendra at gluster.com> wrote:

+Brian Foster

On Wed, Oct 11, 2017 at 4:11 PM, Raghavendra G <raghavendra at gluster.com>
wrote:

> We ran into a regression [2][3]. Hence reviving this thread.
>
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1500269
> [3] https://review.gluster.org/18463
>
> On Thu, Mar 31, 2016 at 1:22 AM, J. Bruce Fields <bfields at fieldses.org>
> wrote:
>
>> On Mon, Mar 28, 2016 at 04:21:00PM -0400, Vijay Bellur wrote:
>> > On 03/28/2016 09:34 AM, FNU Raghavendra Manjunath wrote:
>> > >
>> > >I can understand the concern. But I think instead of generally
>> > >converting all the ESTALE errors ENOENT, probably we should try to
>> > >analyze the errors that are generated by lower layers (like posix).
>> > >
>> > >Even fuse kernel module some times returns ESTALE. (Well, I can see it
>> > >returning ESTALE errors in some cases in the code. Someone please
>> > >correct me if thats not the case).  And aso I am not sure if converting
>> > >all the ESTALE errors to ENOENT is ok or not.
>> >
>> > ESTALE in fuse is returned only for export_operations. fuse
>> > implements this for providing support to export fuse mounts as nfs
>> > exports. A cursory reading of the source seems to indicate that fuse
>> > returns ESTALE only in cases where filehandle resolution fails.
>> >
>> > >
>> > >For fd based operations, I am not sure if ENOENT can be sent or not (as
>> > >though the file is unlinked, it can be accessed if there were open fds
>> > >on it before unlinking the file).
>> >
>> > ESTALE should be fine for fd based operations. It would be analogous
>> > to a filehandle resolution failing and should not be a common
>> > occurrence.
>> >
>> > >
>> > >I feel, we have to look into some parts to check if they generating
>> > >ESTALE is a proper error or not. Also, if there is any bug in below
>> > >layers fixing which can avoid ESTALE errors, then I feel that would be
>> > >the better option.
>> > >
>> >
>> > I would prefer to:
>> >
>> > 1. Return ENOENT for all system calls that operate on a path.
>> >
>> > 2. ESTALE might be ok for file descriptor based operations.
>>
>> Note that operations which operate on paths can fail with ESTALE when
>> they attempt to look up a component within a directory that no longer
>> exists.
>>
>
> But, "man 2 rmdir"  or "man 2 unlink" doesn't list ESTALE as a valid
> error. Also rm doesn't seem to handle ESTALE too [3]
>
> [4] https://github.com/coreutils/coreutils/blob/master/src/remove.c#L305
>
>
>> Maybe non-creating open("./foo") returning ENOENT would be reasonable in
>> this case since that's what you'd get in the local filesystem case, but
>> creat("./foo") returning ENOENT, for example, isn't something
>> applications will be written to handle.
>>
>> The Linux VFS will retry ESTALE on path-based systemcalls *one* time, to
>> reduce the chance of ESTALE in those cases.
>
>
> I should've anticipated bug [2] due to this comment. My mistake. Bug [2]
> is indeed due to kernel not retrying open on receiving an ENOENT error.
> Glusterfs sent ENOENT because file's inode-number/nodeid changed but same
> path exists. The correct error would've been ESTALE, but due to our
> conversion of ESTALE to ENOENT, the latter was sent back to kernel.
>

We've an application which does very frequent renames (around 10-15 per
second). So, single retry by kernel of an open failed with ESTALE is not
helping us.

@Bruce/Brian,

Do you know why VFS chose an approach of retrying instead of a stricter
synchronization mechanism using locking? For eg., rename and open could've
been synchronized using a lock.

For eg., a rough psuedocode for open and rename could've been (please
ignore ordering src,dst locks in rename)

sys_open ()
{
       LOCK (dentry->lock);
       {
            lookup path;
            open (inode)
       }
       UNLOCK (dentry->lock;
}

sys_rename ()
{
         LOCK (dst-dentry->lock);
         {
                LOCK (src-dentry->lock);
                {
                     rename (src, dst);
                }
                UNLOCK (src-dentry->lock);
        }
        UNLOCK (dst-dentry->lock);
}

@Bruce,

With the current retry model, any suggestions on how to handle applications
that do frequent renames?

> Looking through kernel VFS code, only open *seems* to retry
> (do_filep_open). I couldn't find similar logic to other path based syscalls
> like rmdir, unlink, stat, chmod etc
>
> The bugzilla entry that
>> tracked those patches might be interesting:
>>
>>         https://bugzilla.redhat.com/show_bug.cgi?id=678544
>>
>> > NFS recommends that applications add special code for handling
>> > ESTALE [1]. Unfortunately changing application code is not easy and
>> > hence it does not come as a surprise that coreutils also does not
>> > accommodate ESTALE.
>>
>> We also need to consider whether the application's handling of the
>> ENOENT case could be incorrect for the ESTALE case, with consequences
>> possibly as bad as or worse than consequences of seeing an unexpected
>> error.
>>
>> My first intuition is that translating ESTALE to ENOENT is less safe
>> than not doing so, because:
>>
>>         - once an ESTALE-unaware application his the ESTALE case, we
>>           risk a bug regardless of which we return, but if we return
>>           ESTALE at least the problem should be more obvious to the
>>           person debugging.
>>         - for ESTALE-aware applications, the ESTALE/ENOENT distinction
>>           is useful.
>>
>
> Another place to not convert is for  those cases where kernel retries the
> operation on seeing an ESTALE.
>
> I guess we need to think through each operation and we cannot ESTALE to
> ENOENT always.
>
>
>> But I haven't really thought through examples.
>>
>> --b.
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>

regards,
-- 
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel at gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20171018/579c7781/attachment-0001.html>