[Gluster-devel] Fwd: Re: RFC: Using anonymous fds in quick-read

Mon Sep 3 13:10:17 UTC 2012

On Wed, Aug 29, 2012 at 10:57 AM, Anand Avati <avati at redhat.com> wrote:

> (CC'ing gluster-devel)
>
> On 08/28/2012 09:38 PM, Raghavendra Gowdappa wrote:
>
>> Avati,
>>
>> Following are the questions/thoughts related to anonymous fd framework
>> and their usage in quick read. Please answer or give your feedback.
>>
>> Questions related to anonymous fd framework:
>> ==============================**==============
>> * Anonymous fds can work because open in itself doesn't do any primary
>> task application is interested in - like read, write etc (application does
>> an open with an intent of doing something else). This brings in the
>> question, why do we need open at all, can't we eliminate it altogether? If
>> we were to eliminate open, aren't we moving from a neater to   a messy
>> design - each fop has to check whether the work associated with open (like
>> storing contexts etc) is done in every invocation?
>>
>
> Some corrections to the above statement. There are two parts to the
> open() call
>
> 1) The effects of the call itself. Like
> a) Perform permission checks and establish a 'session' (with the fd) on
> the allowed permission [even if permission of the inode changes in the
> future while the fd is still open]
> b) Perform additional operation like file truncation when flag O_TRUNC
> is specified
>
> 2) Side effects of the call, like
> a) Specify the cache effects on future syscalls with O_[RD]SYNC,
> O_DIRECT flags
> b) Offer immunity against future calls like rename() and unlink()
>
> These are the kind of things even Gluster (or any other FS) has to
> guarantee with its open() syscall.
>
> Anonymous fds exist because
> a) Protocols like NFS3 do not support the above semantics and they are
> implemented completely in the client side. But we require an fd_t
> parameter in the read/write fops which also do not require some of the
> above semantics (like read/write perm checks) and other semantics are
> guaranteed by anonymous fds already (like immunity against rename()).
> Note that immunity against unlink() is currently not existing in
> anonymous fds.
>
> b) Internal optimizations in perf xlators do not require all the above
> semantics sometimes.
>
> Whether we use anonymous FDs or not, we need to keep up all the above
> semantics. There are some issues with the semantics even in today's
> version of quick-read - we assume permission check has already happened
> (which is usually true as FUSE performs permission checks) - but that
> may not be the case always. That apart, the benefit of anonymous fds in
> quick-read can be in handling of fd based fops in the window of time
> between a short-cutt'ed open() and its completion from the backend. They
> need not wait for the open() completion if they arrive early. Instead
> they can proceed with an anonymous fd -- which can significantly reduce
> code complexity.

> Again, this can be limited to O_RDONLY +
> ~O_DIRECT|O_TRUNC flag'ed open()s

Why is this restriction? Can you elaborate on that?

> and thereby only be vulnerable to
> unlink()s happening in that window.
>

Irrespective of anonymous fds, quick read would be vulnerable to unlinks in
the window bounded by open returning in application and open actually
happening in backend. I am not seeing how anonymous fds alter this
situation. Can you please explain?

>  * how are ops like fsync handled with anonymous fds? How are we going to
>> identify the fd(s) on server on which writes are actually performed? The
>> problem is more acute if we happen to load write-behind on server side.
>>
>
> With the changes in http://review.gluster.com/712, an fsync() fop will
> be a barrier against all previous writes on the inode (no matter which
> fd). There is no problem if you load write-behind on the server side.
> fsync() is essentially an inode operation and must not discriminate
> writes based on the fd of origin.
>

Is this true even for fsync operation on backend filesystems? Does fsync
flush changes across all fds opened on a file?

>  * Though we are trying to decouple path from adressing an inode in
>> glusterfs using nameless lookups, that decoupling is not complete. There
>> are translators which use naming patterns to assign priorities to file
>> (like io-cache, quick-read for the purposes of deciding whether to flush a
>> cache or not). To be honest, the problem is seen only in fd-migration where
>> we are using nameless lookups - for fresh lookups - in new graph, after a
>> graph switch. Currently I am using nameless lookups with loc.path set,
>> which solves the problem. Ideally nameless lookups are not the ones  to be
>> used during migration, since they are not meant to be used for fresh
>> lookups (atleast till we get rid of dependencies on path based
>> addressability internally in glusterfs). However, they have huge
>> performance beniefits.
>>
>
> Not sure what the above point is w.r.t anonymous fds,

Nothing related to anonymous fds themselves, but to their usage during fd
migration after graph switch. After a graph switch, the first lookup in new
graph is fresh one and translators like io-cache, quick-read, quota that
make use of path information for their internal workings will be in
trouble, if we don't have correct path in loc.path.

but yes - nameless
> lookup takes away the sense of hierarchy (and "filename") and operations
> which depend on filename or hierarchy might not always work. But then
> this has been true even before we brought in nameless lookups as FUSE
> issues open() on an inode and therefore we are not guaranteed to perform
> open() on the right path when you have hardlinks.
>
>  Using anonymous fd framework in quick-read:
>> ==============================**=============
>> * as far as quick read goes, its task becomes very simple. Just convert
>> the fd to anonymous during open and return. It can eliminate all the
>> dependencies of fops having to wait till open is actually done. In fact the
>> fops it has to implement are: lookup, open and readv.
>>
>
> Look at my previous comments, it must perform a little more checks.
> quick-read cannot just "convert" an fd to anonymous fd. Anonymous fd has
> fd->pid == -1 (which a quick-unwound open() fd will not). There are also
> other semantics which need to be met (at least with best effort) while
> the actual fd is still unopened.
>
>
>> * Anonymous fd awareness should be brought in afr. it shouldn't try to
>> open the files in fops like writev if fd happens to be anonymous.
>>
>
> I think that already is the case. Also, why do you specifically mention
> afr?
>

I was thinking in terms of using anonymous fds in quick-read, without
having to open the file explicitly at all by delegating that responsibility
(of open) to servers. Hence, I thought afr need not worry about opening the
files. However, this may not work as you've explained earlier and I need to
think over it.

> Thanks,
> Avati
>
>
>
>
>
>
> ______________________________**_________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/**mailman/listinfo/gluster-devel
>
> --
> Raghavendra G
>
>
>
>  <https://lists.nongnu.org/mailman/listinfo/gluster-devel>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20120903/e1ea3169/attachment-0004.html>