[Gluster-devel] gfapi, readdirplus and forced lookup after inode_link
Raghavendra Gowdappa
rgowdapp at redhat.com
Wed May 11 12:42:34 UTC 2016
----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Soumya Koduri" <skoduri at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Wednesday, May 11, 2016 4:28:28 PM
> Subject: Re: [Gluster-devel] gfapi, readdirplus and forced lookup after inode_link
>
>
>
> ----- Original Message -----
> > From: "Soumya Koduri" <skoduri at redhat.com>
> > To: "Mohammed Rafi K C" <rkavunga at redhat.com>, "Raghavendra Gowdappa"
> > <rgowdapp at redhat.com>, "Niels de Vos"
> > <ndevos at redhat.com>, "Raghavendra Talur" <rtalur at redhat.com>, "Poornima
> > Gurusiddaiah" <pgurusid at redhat.com>
> > Cc: "+rhs-zteam" <rhs-zteam at redhat.com>, "Rajesh Joseph"
> > <rjoseph at redhat.com>, "jtho >> Jiffin Thottan"
> > <jthottan at redhat.com>
> > Sent: Wednesday, May 11, 2016 3:55:05 PM
> > Subject: Re: gfapi, readdirplus and forced lookup after inode_link
> >
> >
> >
> > On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote:
> > >
> > >
> > > On 05/11/2016 12:28 PM, Soumya Koduri wrote:
> > >> Hi Raghavendra,
> > >>
> > >>
> > >>
> > >> On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote:
> > >>> Hi all,
> > >>>
> > >>> There are certain code-paths where the layers managing inodes (gfapi,
> > >>> fuse, nfsv3 etc) need to do a lookup even though the inode is found
> > >>> in inode-table. readdirplus is one such codepath (but not only one).
> > >>> The reason for doing this is that
> > >>> 1. not all xlators have enough information in readdirp_cbk to make
> > >>> inode usable (for eg., dht cannot build layout for directory inodes).
> > >>> 2. There are operations (like dht directory self-healing) which are
> > >>> needed for maintaining internal consistency and these operations
> > >>> cannot be done in readdirp.
> > >>>
> > >>> This forcing of lookup on a linked inode is normally achieved in two
> > >>> ways:
> > >>> 1. lower layers (like dht) setting entry->inode to NULL (without
> > >>> entry->inode, interface layers cannot link the inode).
> > >>
> > >> Rafi (CC'ed) had made changes to fix readdirp specific issue (required
> > >> for tiered volumes) as part of http://review.gluster.org/#/c/14109/ to
> > >> do explicit lookup if either entry->inode is set to NULL or inode_ctx
> > >> is NULL in gfapi. And I think he had made similar changes for
> > >> gluster-NFS as well to provide support for tiered volumes. I am not
> > >> sure if it is handled in common resolver code-path. Have to look at
> > >> the code. Rafi shall be able to confirm it.
> > >
> > > The changes I made in the three access layers are for inodes which was
> > > linked from lower layers. Which means the inodes linked from lower layer
> > > won't have inode ctx set in upper xlators, ie, during resolving we will
> > > send explicit lookup.
> > >
> > > With this changes during resolve if inode_ctx is not set then it will
> > > send a lookup + if set_need_lookup flag is set in inode_ctx, then also
> > > we will send a lookup
That's correct. I think gfapi and fuse-bridge are handling this properly i.e., sending a lookup before resuming fop if:
1. No context of xlator (fuse/gfapi) is present in inode.
Or
2. Context is set and it says resolution is necessary.
Note that case 1 is necessary as inode_linking is done in dht during directory healing. So, other fops might encounter an inode on which resolution is still in progress and not complete yet. As inode-context is set in fuse-bridge/gfapi only after a successful lookup, absence of context can be used as a hint for resolution being in progress.
I am not sure NFSv3 server is doing this.
Also, on bricks too quota-enforcer and bit-rot does inode-linking. So, protocol/server also needs to do similar things.
> > >
> > > As Du mentioned, readdirp set need_lookup everytime for entries in
> > > readdirp, I saw that code in fuse, and gfapi. But I don't remember such
> > > code in gNFS.
> >
> > There are checks for "entry->inode == NULL" in gNFS case as well. Looks
> > like it was Jiffin who made those changes (again wrt to tiered volumes)
> > - http://review.gluster.org/#/c/12960/
> >
> > But all these checks seem to be in only readdirp_cbk codepath where
> > directory entries are filled. What are other fops which need such
> > special handling?
>
> There are some codepaths, where linking is done by xlators who don't do
> resolution. A rough search shows following components:
> 1. quota enforcer
> 2. bitrot
> 3. dht/tier (needed, but currently not doing).
> 4. trash (for .trash I suppose)
>
> However, none of these are explicitly setting need_lookup. So, there are
> windows of time where lookup is partially complete in an xlator graph, but
> other fops start using them. I am currently working on a fix to solve the
> issue for dht/tier on fuse. We have to do similar work on other
> xlators/interface layers too.
>
> >
> > Thanks,
> > Soumya
> >
> >
> > >
> > > Regards
> > > Rafi KC
> > >
> > >>
> > >>
> > >> Thanks,
> > >> Soumya
> > >>
> > >>> 2. interface layers (at least fuse) setting a flag in inode to let
> > >>> resolver know that a lookup is to be done before resuming the fop.
> > >>>
> > >>> I am sure that fuse-bridge does this correctly. Need inputs from you
> > >>> about the behavior of other interface layers like gfapi, nfsv3 etc.
> > >>>
> > >>> regards,
> > >>> Raghavendra
> > >>>
> > >
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list