[Gluster-devel] Inconsistent behavior due to lack of lookup on entry followed by readdirp

Raghavendra Gowdappa rgowdapp at redhat.com
Thu Aug 13 03:36:37 UTC 2015



----- Original Message -----
> From: "Krutika Dhananjay" <kdhananj at redhat.com>
> To: "Mohammed Rafi K C" <rkavunga at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Dan Lambright" <dlambrig at redhat.com>, "Nithya Balachandran"
> <nbalacha at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Ben Turner" <bturner at redhat.com>, "Ben
> England" <bengland at redhat.com>, "Manoj Pillai" <mpillai at redhat.com>, "Pranith Kumar Karampuri"
> <pkarampu at redhat.com>, "Ravishankar Narayanankutty" <ranaraya at redhat.com>, xhernandez at datalab.es
> Sent: Wednesday, August 12, 2015 9:02:44 PM
> Subject: Re: Inconsistent behavior due to lack of lookup on entry followed by readdirp
> 
> I faced the same issue with the sharding translator. I fixed it by making its
> readdirp callback initialize individual entries' inode ctx, some of these
> being xattr values, which are filled in entry->dict by the posix translator.
> Here is the patch that got merged recently: http://review.gluster.org/11854
> Would that be as easy to do in DHT as well?

The problem is not just filling out state in the inode. The bigger problem is healing, which is supposed to maintain a directory/file to be in state consistent with our design before a successful reply to lookup. The operations can involve creating directories on missing subvols, setting appropriate layout, etc. Effectively for readdirp to replace lookup, it should be calling dht_lookup on each of the dentry it is passing back to application.

> 
> As far as AFR is concerned, it indirectly forces LOOKUP on entries which are
> being retrieved for the first time through a READDIRP (and as a result do
> not have their inode ctx etc initialised yet) by setting entry->inode to
> NULL. See afr_readdir_transform_entries().

Hmm. Then we already have "disabled" readdirp through code :). Without an inode corresponding to entry, readdirp will be effectively readdir stripping any performance benefits by having readdirp as a "batched" lookup (of all the dentries).

> This is the default behavior which is being made optional as part of
> http://review.gluster.org/#/c/11846/ which is still under review (see BZ
> 1250803, a performance bug :) ).

If it is made optional, when we enable setting entry->inode we still see consistency issues. Also, it seems to me that there is no point in having each individual xlator option controlling this behaviour. Instead we can make each xlator behave in compliance to global mount option "--use-readdirp=yes/no". Is there any specific reason to have an option to control this behaviour in afr?

> 
> -Krutika
> 
> ----- Original Message -----
> 
> > From: "Mohammed Rafi K C" <rkavunga at redhat.com>
> > To: "Gluster Devel" <gluster-devel at gluster.org>
> > Cc: "Dan Lambright" <dlambrig at redhat.com>, "Nithya Balachandran"
> > <nbalacha at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Ben
> > Turner" <bturner at redhat.com>, "Ben England" <bengland at redhat.com>, "Manoj
> > Pillai" <mpillai at redhat.com>, "Pranith Kumar Karampuri"
> > <pkarampu at redhat.com>, "Ravishankar Narayanankutty" <ranaraya at redhat.com>,
> > kdhananj at redhat.com, xhernandez at datalab.es
> > Sent: Wednesday, August 12, 2015 7:29:48 PM
> > Subject: Inconsistent behavior due to lack of lookup on entry followed by
> > readdirp
> 
> > Hi All,
> 
> > We are facing some inconsistent behavior for fops like rename, unlink
> > etc due to lack of lookup followed by a readdirp, more specifically if
> > inodes/gfid are populated via readdirp call and this nodeid is shared
> > with kernal, md-cache will cache this based on base-name. Then
> > subsequent named lookup will be served from md-cache and it winds-back
> > immediately. So there is a chance to have an FOP triggered with out
> > having a lookup on an entry. DHT does lot of things like creating link
> > files and populate inode_ctx etc, during lookup. In such scenario it is
> > must to have at least one lookup to be happened on an entry. Since
> > readdirp preventing the lookup, it has been very hard for fops to
> > proceed without a first lookup on the entry. We are also suspecting some
> > problems due to same with afr/ec self healing also. So If we remove
> > readdirp from md-cache ([1], [2]) it causes, an additional hop for first
> > lookup for every entry. I'm mostly concerned with this one extra network
> > call, and the performance degradation caused by the same.
> 
> > Now with this, the only advantage with readdirp is, it removes one
> > context switch between kernal and userspace. Is it really worth to
> > sacrifice this for consistency ?
> 
> > What do you think about removing readdirp functionality?
> 
> > Please provide your input/suggestion/ideas.
> 
> > [1] : http://review.gluster.org/#/c/11892/
> 
> > [2] : http://review.gluster.org/#/c/11894/
> 
> > Thanks in Advance
> > Rafi KC
> 


More information about the Gluster-devel mailing list