[Gluster-devel] gfapi, readdirplus and forced lookup after inode_link

Soumya Koduri skoduri at redhat.com
Wed May 11 16:47:19 UTC 2016



On 05/11/2016 06:12 PM, Raghavendra Gowdappa wrote:
>
>
> ----- Original Message -----
>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>> To: "Soumya Koduri" <skoduri at redhat.com>
>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
>> Sent: Wednesday, May 11, 2016 4:28:28 PM
>> Subject: Re: [Gluster-devel] gfapi,	readdirplus and forced lookup after inode_link
>>
>>
>>
>> ----- Original Message -----
>>> From: "Soumya Koduri" <skoduri at redhat.com>
>>> To: "Mohammed Rafi K C" <rkavunga at redhat.com>, "Raghavendra Gowdappa"
>>> <rgowdapp at redhat.com>, "Niels de Vos"
>>> <ndevos at redhat.com>, "Raghavendra Talur" <rtalur at redhat.com>, "Poornima
>>> Gurusiddaiah" <pgurusid at redhat.com>
>>> Cc: "+rhs-zteam" <rhs-zteam at redhat.com>, "Rajesh Joseph"
>>> <rjoseph at redhat.com>, "jtho >> Jiffin Thottan"
>>> <jthottan at redhat.com>
>>> Sent: Wednesday, May 11, 2016 3:55:05 PM
>>> Subject: Re: gfapi, readdirplus and forced lookup after inode_link
>>>
>>>
>>>
>>> On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote:
>>>>
>>>>
>>>> On 05/11/2016 12:28 PM, Soumya Koduri wrote:
>>>>> Hi Raghavendra,
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> There are certain code-paths where the layers managing inodes (gfapi,
>>>>>> fuse, nfsv3 etc) need to do a lookup even though the inode is found
>>>>>> in inode-table. readdirplus is one such codepath (but not only one).
>>>>>> The reason for doing this is that
>>>>>> 1. not all xlators have enough information in readdirp_cbk to make
>>>>>> inode usable (for eg., dht cannot build layout for directory inodes).
>>>>>> 2. There are operations (like dht directory self-healing) which are
>>>>>> needed for maintaining internal consistency and these operations
>>>>>> cannot be done in readdirp.
>>>>>>
>>>>>> This forcing of lookup on a linked inode is normally achieved in two
>>>>>> ways:
>>>>>> 1. lower layers (like dht) setting entry->inode to NULL (without
>>>>>> entry->inode, interface layers cannot link the inode).
>>>>>
>>>>> Rafi (CC'ed) had made changes to fix readdirp specific issue (required
>>>>> for tiered volumes) as part of http://review.gluster.org/#/c/14109/ to
>>>>> do explicit lookup if either entry->inode is set to NULL or inode_ctx
>>>>> is NULL in gfapi. And I think he had made similar changes for
>>>>> gluster-NFS as well to provide support for tiered volumes.  I am not
>>>>> sure if it is handled in common resolver code-path. Have to look at
>>>>> the code. Rafi shall be able to confirm it.
>>>>
>>>> The changes I made in the three access layers are for inodes which was
>>>> linked from lower layers. Which means the inodes linked from lower layer
>>>> won't have inode ctx set in upper xlators, ie, during resolving we will
>>>> send explicit lookup.
>>>>
>>>> With this changes during resolve if inode_ctx is not set then it will
>>>> send a lookup + if set_need_lookup flag is set in inode_ctx, then also
>>>> we will send a lookup
>
> That's correct. I think gfapi and fuse-bridge are handling this properly i.e., sending a lookup before resuming fop if:
> 1. No context of xlator (fuse/gfapi) is present in inode.
> Or
> 2. Context is set and it says resolution is necessary.
>
> Note that case 1 is necessary as inode_linking is done in dht during directory healing. So, other fops might encounter an inode on which resolution is still in progress and not complete yet. As inode-context is set in fuse-bridge/gfapi only after a successful lookup, absence of context can be used as a hint for resolution being in progress.
>
> I am not sure NFSv3 server is doing this.

Case (1) is definitely handled in gluster-NFS. In fact looks like it was 
as part of single BZ# 1297311 [1], all the required changes were made in 
these layers (fuse,gNFS, gfapi) to handle the cases you had mentioned 
above. However, at-least when compared the patches [2], [3] & [4], I do 
not see need_lookup changes in gluster-NFS.

Rafi, do you recall why its been so?  Was it intentional?

Thanks,
Soumya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1297311
[2] http://review.gluster.org/#/c/13224/
[3] http://review.gluster.org/#/c/13225
[4] http://review.gluster.org/#/c/13226

>
> Also, on bricks too quota-enforcer and bit-rot does inode-linking. So, protocol/server also needs to do similar things.
>
>>>>
>>>> As Du mentioned, readdirp set need_lookup everytime for entries in
>>>> readdirp, I saw that code in fuse, and gfapi. But I don't remember such
>>>> code in gNFS.
>>>
>>> There are checks for "entry->inode == NULL" in gNFS case as well. Looks
>>> like it was Jiffin who made those changes (again wrt to tiered volumes)
>>> 	- http://review.gluster.org/#/c/12960/
>>>
>>> But all these checks seem to be in only readdirp_cbk codepath where
>>> directory entries are filled. What are other fops which need such
>>> special handling?
>>
>> There are some codepaths, where linking is done by xlators who don't do
>> resolution. A rough search shows following components:
>> 1. quota enforcer
>> 2. bitrot
>> 3. dht/tier (needed, but currently not doing).
>> 4. trash (for .trash I suppose)
>>
>> However, none of these are explicitly setting need_lookup. So, there are
>> windows of time where lookup is partially complete in an xlator graph, but
>> other fops start using them. I am currently working on a fix to solve the
>> issue for dht/tier on fuse. We have to do similar work on other
>> xlators/interface layers too.
>>
>>>
>>> Thanks,
>>> Soumya
>>>
>>>
>>>>
>>>> Regards
>>>> Rafi KC
>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Soumya
>>>>>
>>>>>> 2. interface layers (at least fuse) setting a flag in inode to let
>>>>>> resolver know that a lookup is to be done before resuming the fop.
>>>>>>
>>>>>> I am sure that fuse-bridge does this correctly. Need inputs from you
>>>>>> about the behavior of other interface layers like gfapi, nfsv3 etc.
>>>>>>
>>>>>> regards,
>>>>>> Raghavendra
>>>>>>
>>>>
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>


More information about the Gluster-devel mailing list