<div dir="auto"><div>Adding in Eric and Steve.</div><div dir="auto"><br></div><div dir="auto">Ric</div><div dir="auto"><br><div class="gmail_extra" dir="auto"><br><div class="gmail_quote">On Oct 18, 2017 9:35 AM, "Raghavendra G" <<a href="mailto:raghavendra@gluster.com">raghavendra@gluster.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">+Brian Foster<br><div class="gmail_extra"><br><div class="gmail_quote"><div class="elided-text">On Wed, Oct 11, 2017 at 4:11 PM, Raghavendra G <span dir="ltr"><<a href="mailto:raghavendra@gluster.com" target="_blank">raghavendra@gluster.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">We ran into a regression [2][3]. Hence reviving this thread.<br><br>[2] <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1500269" target="_blank">https://bugzilla.redhat.com/sh<wbr>ow_bug.cgi?id=1500269</a><br>[3] <a href="https://review.gluster.org/18463" target="_blank">https://review.gluster.org/184<wbr>63</a><br><div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_-7260625155758624070h5">On Thu, Mar 31, 2016 at 1:22 AM, J. Bruce Fields <span dir="ltr"><<a href="mailto:bfields@fieldses.org" target="_blank">bfields@fieldses.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="m_-7260625155758624070m_8986995149772584679gmail-HOEnZb"><div class="m_-7260625155758624070m_8986995149772584679gmail-h5">On Mon, Mar 28, 2016 at 04:21:00PM -0400, Vijay Bellur wrote:<br>
> On 03/28/2016 09:34 AM, FNU Raghavendra Manjunath wrote:<br>
> ><br>
> >I can understand the concern. But I think instead of generally<br>
> >converting all the ESTALE errors ENOENT, probably we should try to<br>
> >analyze the errors that are generated by lower layers (like posix).<br>
> ><br>
> >Even fuse kernel module some times returns ESTALE. (Well, I can see it<br>
> >returning ESTALE errors in some cases in the code. Someone please<br>
> >correct me if thats not the case). And aso I am not sure if converting<br>
> >all the ESTALE errors to ENOENT is ok or not.<br>
><br>
> ESTALE in fuse is returned only for export_operations. fuse<br>
> implements this for providing support to export fuse mounts as nfs<br>
> exports. A cursory reading of the source seems to indicate that fuse<br>
> returns ESTALE only in cases where filehandle resolution fails.<br>
><br>
> ><br>
> >For fd based operations, I am not sure if ENOENT can be sent or not (as<br>
> >though the file is unlinked, it can be accessed if there were open fds<br>
> >on it before unlinking the file).<br>
><br>
> ESTALE should be fine for fd based operations. It would be analogous<br>
> to a filehandle resolution failing and should not be a common<br>
> occurrence.<br>
><br>
> ><br>
> >I feel, we have to look into some parts to check if they generating<br>
> >ESTALE is a proper error or not. Also, if there is any bug in below<br>
> >layers fixing which can avoid ESTALE errors, then I feel that would be<br>
> >the better option.<br>
> ><br>
><br>
> I would prefer to:<br>
><br>
> 1. Return ENOENT for all system calls that operate on a path.<br>
><br>
> 2. ESTALE might be ok for file descriptor based operations.<br>
<br>
</div></div>Note that operations which operate on paths can fail with ESTALE when<br>
they attempt to look up a component within a directory that no longer<br>
exists.<br></blockquote><div><br></div></div></div><div>But, "man 2 rmdir" or "man 2 unlink" doesn't list ESTALE as a valid error. Also rm doesn't seem to handle ESTALE too [3]</div><div><br></div><div>[4] <a href="https://github.com/coreutils/coreutils/blob/master/src/remove.c#L305" target="_blank">https://github.com/coreutils/c<wbr>oreutils/blob/master/src/remov<wbr>e.c#L305</a></div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Maybe non-creating open("./foo") returning ENOENT would be reasonable in<br>
this case since that's what you'd get in the local filesystem case, but<br>
creat("./foo") returning ENOENT, for example, isn't something<br>
applications will be written to handle.<br>
<br>
The Linux VFS will retry ESTALE on path-based systemcalls *one* time, to<br>
reduce the chance of ESTALE in those cases. </blockquote><div><br></div></span><div>I should've anticipated bug [2] due to this comment. My mistake. Bug [2] is indeed due to kernel not retrying open on receiving an ENOENT error. Glusterfs sent ENOENT because file's inode-number/nodeid changed but same path exists. The correct error would've been ESTALE, but due to our conversion of ESTALE to ENOENT, the latter was sent back to kernel.<br></div></div></div></div></div></blockquote><div><br></div></div><div>We've an application which does very frequent renames (around 10-15 per second). So, single retry by kernel of an open failed with ESTALE is not helping us.</div><div><br></div><div>@Bruce/Brian,</div><div><br></div><div>Do you know why VFS chose an approach of retrying instead of a stricter synchronization mechanism using locking? For eg., rename and open could've been synchronized using a lock.</div><div><br></div><div>For eg., a rough psuedocode for open and rename could've been (please ignore ordering src,dst locks in rename)</div><div><br></div><div>sys_open ()</div><div>{</div><div> LOCK (dentry->lock);</div><div> {</div><div> lookup path;</div><div> open (inode)</div><div> }</div><div> UNLOCK (dentry->lock;</div><div>}</div><div><br></div><div>sys_rename ()</div><div>{</div><div> LOCK (dst-dentry->lock);</div><div> {</div><div> LOCK (src-dentry->lock);</div><div> {</div><div> rename (src, dst);</div><div> }</div><div> UNLOCK (src-dentry->lock);</div><div> }</div><div> UNLOCK (dst-dentry->lock);</div><div>}<br></div><div><br> </div><div>@Bruce,</div><div><br></div><div>With the current retry model, any suggestions on how to handle applications that do frequent renames?</div><div class="elided-text"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br> </div><div>Looking through kernel VFS code, only open *seems* to retry (do_filep_open). I couldn't find similar logic to other path based syscalls like rmdir, unlink, stat, chmod etc</div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The bugzilla entry that<br>
tracked those patches might be interesting:<br>
<br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=678544" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/sh<wbr>ow_bug.cgi?id=678544</a><br>
<span class="m_-7260625155758624070m_8986995149772584679gmail-"><br>
> NFS recommends that applications add special code for handling<br>
> ESTALE [1]. Unfortunately changing application code is not easy and<br>
> hence it does not come as a surprise that coreutils also does not<br>
> accommodate ESTALE.<br>
<br>
</span>We also need to consider whether the application's handling of the<br>
ENOENT case could be incorrect for the ESTALE case, with consequences<br>
possibly as bad as or worse than consequences of seeing an unexpected<br>
error.<br>
<br>
My first intuition is that translating ESTALE to ENOENT is less safe<br>
than not doing so, because:<br>
<br>
- once an ESTALE-unaware application his the ESTALE case, we<br>
risk a bug regardless of which we return, but if we return<br>
ESTALE at least the problem should be more obvious to the<br>
person debugging.<br>
- for ESTALE-aware applications, the ESTALE/ENOENT distinction<br>
is useful.<br></blockquote><div><br></div></span><div>Another place to not convert is for those cases where kernel retries the operation on seeing an ESTALE.</div><div><br></div><div>I guess we need to think through each operation and we cannot ESTALE to ENOENT always.<br></div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
But I haven't really thought through examples.<br>
<span class="m_-7260625155758624070m_8986995149772584679gmail-HOEnZb"><font color="#888888"><br>
--b.<br>
</font></span><div class="m_-7260625155758624070m_8986995149772584679gmail-HOEnZb"><div class="m_-7260625155758624070m_8986995149772584679gmail-h5">______________________________<wbr>_________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman<wbr>/listinfo/gluster-devel</a><br>
</div></div></blockquote></span></div><span class="m_-7260625155758624070HOEnZb"><font color="#888888"><br><br clear="all"><br>-- <br><div class="m_-7260625155758624070m_8986995149772584679gmail_signature">Raghavendra G<br></div>
</font></span></div></div></div>
</blockquote></div></div><br><br clear="all"></div><div class="gmail_extra">regards,<font color="#888888"><br></font></div><font color="#888888"><div class="gmail_extra">-- <br><div class="m_-7260625155758624070gmail_signature" data-smartmail="gmail_signature">Raghavendra G<br></div>
</div></font></div>
<br>______________________________<wbr>_________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-devel</a><br></blockquote></div><br></div></div></div>