[Gluster-devel] gfapi, readdirplus and forced lookup after inode_link

Soumya Koduri skoduri at redhat.com
Wed May 11 17:05:18 UTC 2016



On 05/11/2016 10:17 PM, Soumya Koduri wrote:
>
>
> On 05/11/2016 06:12 PM, Raghavendra Gowdappa wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>> To: "Soumya Koduri" <skoduri at redhat.com>
>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
>>> Sent: Wednesday, May 11, 2016 4:28:28 PM
>>> Subject: Re: [Gluster-devel] gfapi,    readdirplus and forced lookup
>>> after inode_link
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Soumya Koduri" <skoduri at redhat.com>
>>>> To: "Mohammed Rafi K C" <rkavunga at redhat.com>, "Raghavendra Gowdappa"
>>>> <rgowdapp at redhat.com>, "Niels de Vos"
>>>> <ndevos at redhat.com>, "Raghavendra Talur" <rtalur at redhat.com>, "Poornima
>>>> Gurusiddaiah" <pgurusid at redhat.com>
>>>> Cc: "+rhs-zteam" <rhs-zteam at redhat.com>, "Rajesh Joseph"
>>>> <rjoseph at redhat.com>, "jtho >> Jiffin Thottan"
>>>> <jthottan at redhat.com>
>>>> Sent: Wednesday, May 11, 2016 3:55:05 PM
>>>> Subject: Re: gfapi, readdirplus and forced lookup after inode_link
>>>>
>>>>
>>>>
>>>> On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote:
>>>>>
>>>>>
>>>>> On 05/11/2016 12:28 PM, Soumya Koduri wrote:
>>>>>> Hi Raghavendra,
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> There are certain code-paths where the layers managing inodes
>>>>>>> (gfapi,
>>>>>>> fuse, nfsv3 etc) need to do a lookup even though the inode is found
>>>>>>> in inode-table. readdirplus is one such codepath (but not only one).
>>>>>>> The reason for doing this is that
>>>>>>> 1. not all xlators have enough information in readdirp_cbk to make
>>>>>>> inode usable (for eg., dht cannot build layout for directory
>>>>>>> inodes).
>>>>>>> 2. There are operations (like dht directory self-healing) which are
>>>>>>> needed for maintaining internal consistency and these operations
>>>>>>> cannot be done in readdirp.
>>>>>>>
>>>>>>> This forcing of lookup on a linked inode is normally achieved in two
>>>>>>> ways:
>>>>>>> 1. lower layers (like dht) setting entry->inode to NULL (without
>>>>>>> entry->inode, interface layers cannot link the inode).
>>>>>>
>>>>>> Rafi (CC'ed) had made changes to fix readdirp specific issue
>>>>>> (required
>>>>>> for tiered volumes) as part of
>>>>>> http://review.gluster.org/#/c/14109/ to
>>>>>> do explicit lookup if either entry->inode is set to NULL or inode_ctx
>>>>>> is NULL in gfapi. And I think he had made similar changes for
>>>>>> gluster-NFS as well to provide support for tiered volumes.  I am not
>>>>>> sure if it is handled in common resolver code-path. Have to look at
>>>>>> the code. Rafi shall be able to confirm it.
>>>>>
>>>>> The changes I made in the three access layers are for inodes which was
>>>>> linked from lower layers. Which means the inodes linked from lower
>>>>> layer
>>>>> won't have inode ctx set in upper xlators, ie, during resolving we
>>>>> will
>>>>> send explicit lookup.
>>>>>
>>>>> With this changes during resolve if inode_ctx is not set then it will
>>>>> send a lookup + if set_need_lookup flag is set in inode_ctx, then also
>>>>> we will send a lookup
>>
>> That's correct. I think gfapi and fuse-bridge are handling this
>> properly i.e., sending a lookup before resuming fop if:
>> 1. No context of xlator (fuse/gfapi) is present in inode.
>> Or
>> 2. Context is set and it says resolution is necessary.
>>
>> Note that case 1 is necessary as inode_linking is done in dht during
>> directory healing. So, other fops might encounter an inode on which
>> resolution is still in progress and not complete yet. As inode-context
>> is set in fuse-bridge/gfapi only after a successful lookup, absence of
>> context can be used as a hint for resolution being in progress.
>>
>> I am not sure NFSv3 server is doing this.
>
> Case (1) is definitely handled in gluster-NFS. In fact looks like it was
> as part of single BZ# 1297311 [1], all the required changes were made in
> these layers (fuse,gNFS, gfapi) to handle the cases you had mentioned
> above. However, at-least when compared the patches [2], [3] & [4], I do
> not see need_lookup changes in gluster-NFS.
>
> Rafi, do you recall why its been so?  Was it intentional?

I think the reason being, in gluster-NFS, " nfs_fix_generation()", where 
inode_ctx is set, is being called only at selective places. So at-least 
in readdirp_cbk(), we seem to be just linking inodes but not setting 
inode_ctx. So that would result in forced lookup next time any fop is 
performed on that inode.

Rafi,
  Could you please confirm if that was indeed the case?

Thanks,
Soumya

>
> Thanks,
> Soumya
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1297311
> [2] http://review.gluster.org/#/c/13224/
> [3] http://review.gluster.org/#/c/13225
> [4] http://review.gluster.org/#/c/13226
>
>>
>> Also, on bricks too quota-enforcer and bit-rot does inode-linking. So,
>> protocol/server also needs to do similar things.
>>
>>>>>
>>>>> As Du mentioned, readdirp set need_lookup everytime for entries in
>>>>> readdirp, I saw that code in fuse, and gfapi. But I don't remember
>>>>> such
>>>>> code in gNFS.
>>>>
>>>> There are checks for "entry->inode == NULL" in gNFS case as well. Looks
>>>> like it was Jiffin who made those changes (again wrt to tiered volumes)
>>>>     - http://review.gluster.org/#/c/12960/
>>>>
>>>> But all these checks seem to be in only readdirp_cbk codepath where
>>>> directory entries are filled. What are other fops which need such
>>>> special handling?
>>>
>>> There are some codepaths, where linking is done by xlators who don't do
>>> resolution. A rough search shows following components:
>>> 1. quota enforcer
>>> 2. bitrot
>>> 3. dht/tier (needed, but currently not doing).
>>> 4. trash (for .trash I suppose)
>>>
>>> However, none of these are explicitly setting need_lookup. So, there are
>>> windows of time where lookup is partially complete in an xlator
>>> graph, but
>>> other fops start using them. I am currently working on a fix to solve
>>> the
>>> issue for dht/tier on fuse. We have to do similar work on other
>>> xlators/interface layers too.
>>>
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>
>>>>>
>>>>> Regards
>>>>> Rafi KC
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Soumya
>>>>>>
>>>>>>> 2. interface layers (at least fuse) setting a flag in inode to let
>>>>>>> resolver know that a lookup is to be done before resuming the fop.
>>>>>>>
>>>>>>> I am sure that fuse-bridge does this correctly. Need inputs from you
>>>>>>> about the behavior of other interface layers like gfapi, nfsv3 etc.
>>>>>>>
>>>>>>> regards,
>>>>>>> Raghavendra
>>>>>>>
>>>>>
>>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list