[Gluster-devel] Serialization of fops acting on same dentry on server

Raghavendra Gowdappa rgowdapp at redhat.com
Mon Aug 17 05:09:38 UTC 2015


Pranith and me were discussing about implementation of compound operations like "create + lock", "mkdir + lock", "open + lock" etc. These operations are useful in situations like:

1. To prevent locking on all subvols during directory creation as part of self heal in dht. Currently we are following approach of locking _all_ subvols by both rmdir and lookup-heal [1].
2. To lock a file in advance so that there is less performance hit during transactions in afr.

While thinking about implementing such compound operations, it occurred to me that one of the problems would be how do we handle a racing mkdir/create and a (named lookup - simply referred as lookup from now on - followed by lock). This is because,
1. creation of directory/file on backend
2. linking of the inode with the gfid corresponding to that file/directory

are not atomic. It is not guaranteed that inode passed down during mkdir/create call need not be the one that survives in inode table. Since posix-locks xlator maintains all the lock-state in inode, it would be a problem if a different inode is linked in inode table than the one passed during mkdir/create. One way to solve this problem is to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a particular dentry. This serialization would also solve other bugs like:

1. issues solved by [2][3] and possibly many such issues.
2. Stale dentries left out in bricks' inode table because of a racing lookup and dentry modification ops (like rmdir, unlink, rename etc).

Initial idea I've now is to maintain fops in-progress on a dentry in parent inode (may be resolver code in protocol/server). Based on this we can serialize the operations. Since we need to serialize _only_ operations on a dentry (we don't serialize nameless lookups), it is guaranteed that we do have a parent inode always. Any comments/discussion on this would be appreciated.

[1] http://review.gluster.org/11725
[2] http://review.gluster.org/9913
[3] http://review.gluster.org/5240


More information about the Gluster-devel mailing list