[Gluster-devel] Inconsistent behavior due to lack of lookup on entry followed by readdirp

Krutika Dhananjay kdhananj at redhat.com
Wed Aug 12 15:32:44 UTC 2015

I faced the same issue with the sharding translator. I fixed it by making its readdirp callback initialize individual entries' inode ctx, some of these being xattr values, which are filled in entry->dict by the posix translator. 
Here is the patch that got merged recently: http://review.gluster.org/11854 
Would that be as easy to do in DHT as well? 

As far as AFR is concerned, it indirectly forces LOOKUP on entries which are being retrieved for the first time through a READDIRP (and as a result do not have their inode ctx etc initialised yet) by setting entry->inode to NULL. See afr_readdir_transform_entries(). 
This is the default behavior which is being made optional as part of http://review.gluster.org/#/c/11846/ which is still under review (see BZ 1250803, a performance bug :) ). 


----- Original Message -----

> From: "Mohammed Rafi K C" <rkavunga at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Cc: "Dan Lambright" <dlambrig at redhat.com>, "Nithya Balachandran"
> <nbalacha at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Ben
> Turner" <bturner at redhat.com>, "Ben England" <bengland at redhat.com>, "Manoj
> Pillai" <mpillai at redhat.com>, "Pranith Kumar Karampuri"
> <pkarampu at redhat.com>, "Ravishankar Narayanankutty" <ranaraya at redhat.com>,
> kdhananj at redhat.com, xhernandez at datalab.es
> Sent: Wednesday, August 12, 2015 7:29:48 PM
> Subject: Inconsistent behavior due to lack of lookup on entry followed by
> readdirp

> Hi All,

> We are facing some inconsistent behavior for fops like rename, unlink
> etc due to lack of lookup followed by a readdirp, more specifically if
> inodes/gfid are populated via readdirp call and this nodeid is shared
> with kernal, md-cache will cache this based on base-name. Then
> subsequent named lookup will be served from md-cache and it winds-back
> immediately. So there is a chance to have an FOP triggered with out
> having a lookup on an entry. DHT does lot of things like creating link
> files and populate inode_ctx etc, during lookup. In such scenario it is
> must to have at least one lookup to be happened on an entry. Since
> readdirp preventing the lookup, it has been very hard for fops to
> proceed without a first lookup on the entry. We are also suspecting some
> problems due to same with afr/ec self healing also. So If we remove
> readdirp from md-cache ([1], [2]) it causes, an additional hop for first
> lookup for every entry. I'm mostly concerned with this one extra network
> call, and the performance degradation caused by the same.

> Now with this, the only advantage with readdirp is, it removes one
> context switch between kernal and userspace. Is it really worth to
> sacrifice this for consistency ?

> What do you think about removing readdirp functionality?

> Please provide your input/suggestion/ideas.

> [1] : http://review.gluster.org/#/c/11892/

> [2] : http://review.gluster.org/#/c/11894/

> Thanks in Advance
> Rafi KC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150812/104df320/attachment-0001.html>

More information about the Gluster-devel mailing list