[Gluster-devel] gfapi, readdirplus and forced lookup after inode_link

Raghavendra Gowdappa rgowdapp at redhat.com
Wed May 11 10:58:28 UTC 2016



----- Original Message -----
> From: "Soumya Koduri" <skoduri at redhat.com>
> To: "Mohammed Rafi K C" <rkavunga at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Niels de Vos"
> <ndevos at redhat.com>, "Raghavendra Talur" <rtalur at redhat.com>, "Poornima Gurusiddaiah" <pgurusid at redhat.com>
> Cc: "+rhs-zteam" <rhs-zteam at redhat.com>, "Rajesh Joseph" <rjoseph at redhat.com>, "jtho >> Jiffin Thottan"
> <jthottan at redhat.com>
> Sent: Wednesday, May 11, 2016 3:55:05 PM
> Subject: Re: gfapi, readdirplus and forced lookup after inode_link
> 
> 
> 
> On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote:
> >
> >
> > On 05/11/2016 12:28 PM, Soumya Koduri wrote:
> >> Hi Raghavendra,
> >>
> >>
> >>
> >> On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote:
> >>> Hi all,
> >>>
> >>> There are certain code-paths where the layers managing inodes (gfapi,
> >>> fuse, nfsv3 etc) need to do a lookup even though the inode is found
> >>> in inode-table. readdirplus is one such codepath (but not only one).
> >>> The reason for doing this is that
> >>> 1. not all xlators have enough information in readdirp_cbk to make
> >>> inode usable (for eg., dht cannot build layout for directory inodes).
> >>> 2. There are operations (like dht directory self-healing) which are
> >>> needed for maintaining internal consistency and these operations
> >>> cannot be done in readdirp.
> >>>
> >>> This forcing of lookup on a linked inode is normally achieved in two
> >>> ways:
> >>> 1. lower layers (like dht) setting entry->inode to NULL (without
> >>> entry->inode, interface layers cannot link the inode).
> >>
> >> Rafi (CC'ed) had made changes to fix readdirp specific issue (required
> >> for tiered volumes) as part of http://review.gluster.org/#/c/14109/ to
> >> do explicit lookup if either entry->inode is set to NULL or inode_ctx
> >> is NULL in gfapi. And I think he had made similar changes for
> >> gluster-NFS as well to provide support for tiered volumes.  I am not
> >> sure if it is handled in common resolver code-path. Have to look at
> >> the code. Rafi shall be able to confirm it.
> >
> > The changes I made in the three access layers are for inodes which was
> > linked from lower layers. Which means the inodes linked from lower layer
> > won't have inode ctx set in upper xlators, ie, during resolving we will
> > send explicit lookup.
> >
> > With this changes during resolve if inode_ctx is not set then it will
> > send a lookup + if set_need_lookup flag is set in inode_ctx, then also
> > we will send a lookup
> >
> > As Du mentioned, readdirp set need_lookup everytime for entries in
> > readdirp, I saw that code in fuse, and gfapi. But I don't remember such
> > code in gNFS.
> 
> There are checks for "entry->inode == NULL" in gNFS case as well. Looks
> like it was Jiffin who made those changes (again wrt to tiered volumes)
> 	- http://review.gluster.org/#/c/12960/
> 
> But all these checks seem to be in only readdirp_cbk codepath where
> directory entries are filled. What are other fops which need such
> special handling?

There are some codepaths, where linking is done by xlators who don't do resolution. A rough search shows following components:
1. quota enforcer
2. bitrot
3. dht/tier (needed, but currently not doing).
4. trash (for .trash I suppose)

However, none of these are explicitly setting need_lookup. So, there are windows of time where lookup is partially complete in an xlator graph, but other fops start using them. I am currently working on a fix to solve the issue for dht/tier on fuse. We have to do similar work on other xlators/interface layers too.

> 
> Thanks,
> Soumya
> 
> 
> >
> > Regards
> > Rafi KC
> >
> >>
> >>
> >> Thanks,
> >> Soumya
> >>
> >>> 2. interface layers (at least fuse) setting a flag in inode to let
> >>> resolver know that a lookup is to be done before resuming the fop.
> >>>
> >>> I am sure that fuse-bridge does this correctly. Need inputs from you
> >>> about the behavior of other interface layers like gfapi, nfsv3 etc.
> >>>
> >>> regards,
> >>> Raghavendra
> >>>
> >
> 


More information about the Gluster-devel mailing list