[Gluster-devel] Serialization of fops acting on same dentry on server
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Aug 19 09:25:28 UTC 2015
+ Ravi, Anuradha
On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote:
> All,
>
> Pranith and me were discussing about implementation of compound operations like "create + lock", "mkdir + lock", "open + lock" etc. These operations are useful in situations like:
>
> 1. To prevent locking on all subvols during directory creation as part of self heal in dht. Currently we are following approach of locking _all_ subvols by both rmdir and lookup-heal [1].
> 2. To lock a file in advance so that there is less performance hit during transactions in afr.
>
> While thinking about implementing such compound operations, it occurred to me that one of the problems would be how do we handle a racing mkdir/create and a (named lookup - simply referred as lookup from now on - followed by lock). This is because,
> 1. creation of directory/file on backend
> 2. linking of the inode with the gfid corresponding to that file/directory
>
> are not atomic. It is not guaranteed that inode passed down during mkdir/create call need not be the one that survives in inode table. Since posix-locks xlator maintains all the lock-state in inode, it would be a problem if a different inode is linked in inode table than the one passed during mkdir/create. One way to solve this problem is to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a particular dentry. This serialization would also solve other bugs like:
>
> 1. issues solved by [2][3] and possibly many such issues.
> 2. Stale dentries left out in bricks' inode table because of a racing lookup and dentry modification ops (like rmdir, unlink, rename etc).
>
> Initial idea I've now is to maintain fops in-progress on a dentry in parent inode (may be resolver code in protocol/server). Based on this we can serialize the operations. Since we need to serialize _only_ operations on a dentry (we don't serialize nameless lookups), it is guaranteed that we do have a parent inode always. Any comments/discussion on this would be appreciated.
>
> [1] http://review.gluster.org/11725
> [2] http://review.gluster.org/9913
> [3] http://review.gluster.org/5240
>
> regards,
> Raghavendra.
More information about the Gluster-devel
mailing list