[Gluster-devel] Serialization of fops acting on same dentry on server

Mon Aug 17 05:44:18 UTC 2015

On Mon, Aug 17, 2015 at 01:09:38AM -0400, Raghavendra Gowdappa wrote:
> All,
> 
> Pranith and me were discussing about implementation of compound
> operations like "create + lock", "mkdir + lock", "open + lock" etc.
> These operations are useful in situations like:
> 
> 1. To prevent locking on all subvols during directory creation as part
> of self heal in dht. Currently we are following approach of locking
> _all_ subvols by both rmdir and lookup-heal [1].
> 2. To lock a file in advance so that there is less performance hit
> during transactions in afr.

I have an interest in compound/composite procedures too. My use-case is
a little different, and I (was and still) am planning to send more
details about it soon.

Basically, there are certain cases where libgfapi will not be able to
automatically pass the uid/gid in the RPC-header. A design for
supporting Kerberos will mainly use the standardized RPCSEC_GSS. If
there is no option to use the Kerberos credentials of the user doing
I/O (remote client, not using Kerberos to talk to samba/ganesha), the
username (or uid/gid) needs to be passed to the storage servers.

A compound/composite procedure would then look like this:

  [RPC header]
    [AUTH_GSS + Kerberos principal for libgfapi/samba/ganesha/...]

  [GlusterFS COMPOUND]
    [SETFSUID]
    [SETLOCKOWNER]
    [${FOP}]
    [.. more FOPs?]

This idea has not been reviewed/commented on with some of the Kerberos
experts that I want to involve. A more complete description about the
plans to support Kerberos will follow.

Do you think that this matches your ideas on compound operations?

Thanks,
Niels

> 
> While thinking about implementing such compound operations, it
> occurred to me that one of the problems would be how do we handle a
> racing mkdir/create and a (named lookup - simply referred as lookup
> from now on - followed by lock). This is because,
> 1. creation of directory/file on backend
> 2. linking of the inode with the gfid corresponding to that
> file/directory
> 
> are not atomic. It is not guaranteed that inode passed down during
> mkdir/create call need not be the one that survives in inode table.
> Since posix-locks xlator maintains all the lock-state in inode, it
> would be a problem if a different inode is linked in inode table than
> the one passed during mkdir/create. One way to solve this problem is
> to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink)
> that are happening on a particular dentry. This serialization would
> also solve other bugs like:
> 
> 1. issues solved by [2][3] and possibly many such issues.
> 2. Stale dentries left out in bricks' inode table because of a racing
> lookup and dentry modification ops (like rmdir, unlink, rename etc).
> 
> Initial idea I've now is to maintain fops in-progress on a dentry in
> parent inode (may be resolver code in protocol/server). Based on this
> we can serialize the operations. Since we need to serialize _only_
> operations on a dentry (we don't serialize nameless lookups), it is
> guaranteed that we do have a parent inode always. Any
> comments/discussion on this would be appreciated.
> 
> [1] http://review.gluster.org/11725
> [2] http://review.gluster.org/9913
> [3] http://review.gluster.org/5240
> 
> regards,
> Raghavendra.
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150817/3e318bb4/attachment.sig>