[Gluster-devel] Serialization of fops acting on same dentry on server

Mon Aug 17 05:19:08 UTC 2015

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Cc: "Sakshi Bansal" <sabansal at redhat.com>
> Sent: Monday, 17 August, 2015 10:39:38 AM
> Subject: [Gluster-devel] Serialization of fops acting on same dentry on	server
> 
> All,
> 
> Pranith and me were discussing about implementation of compound operations
> like "create + lock", "mkdir + lock", "open + lock" etc. These operations
> are useful in situations like:
> 
> 1. To prevent locking on all subvols during directory creation as part of
> self heal in dht. Currently we are following approach of locking _all_
> subvols by both rmdir and lookup-heal [1].

Correction. It should've been, "to prevent locking on all subvols during rmdir". The lookup self-heal should lock on all subvols (with compound "mkdir + lookup" if directory is not present on a subvol). With this rmdir/rename can lock on just any one subvol and this will prevent any parallel lookup-heal from preventing directory creation.

> 2. To lock a file in advance so that there is less performance hit during
> transactions in afr.
> 
> While thinking about implementing such compound operations, it occurred to me
> that one of the problems would be how do we handle a racing mkdir/create and
> a (named lookup - simply referred as lookup from now on - followed by lock).
> This is because,
> 1. creation of directory/file on backend
> 2. linking of the inode with the gfid corresponding to that file/directory
> 
> are not atomic. It is not guaranteed that inode passed down during
> mkdir/create call need not be the one that survives in inode table. Since
> posix-locks xlator maintains all the lock-state in inode, it would be a
> problem if a different inode is linked in inode table than the one passed
> during mkdir/create. One way to solve this problem is to serialize fops
> (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a
> particular dentry. This serialization would also solve other bugs like:
> 
> 1. issues solved by [2][3] and possibly many such issues.
> 2. Stale dentries left out in bricks' inode table because of a racing lookup
> and dentry modification ops (like rmdir, unlink, rename etc).
> 
> Initial idea I've now is to maintain fops in-progress on a dentry in parent
> inode (may be resolver code in protocol/server). Based on this we can
> serialize the operations. Since we need to serialize _only_ operations on a
> dentry (we don't serialize nameless lookups), it is guaranteed that we do
> have a parent inode always. Any comments/discussion on this would be
> appreciated.
> 
> [1] http://review.gluster.org/11725
> [2] http://review.gluster.org/9913
> [3] http://review.gluster.org/5240
> 
> regards,
> Raghavendra.
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>