[Gluster-devel] Serialization of fops acting on same dentry on server

Raghavendra Gowdappa rgowdapp at redhat.com
Mon Aug 17 06:05:31 UTC 2015



----- Original Message -----
> From: "Niels de Vos" <ndevos at redhat.com>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Sakshi Bansal" <sabansal at redhat.com>
> Sent: Monday, 17 August, 2015 11:14:18 AM
> Subject: Re: [Gluster-devel] Serialization of fops acting on same dentry on server
> 
> On Mon, Aug 17, 2015 at 01:09:38AM -0400, Raghavendra Gowdappa wrote:
> > All,
> > 
> > Pranith and me were discussing about implementation of compound
> > operations like "create + lock", "mkdir + lock", "open + lock" etc.
> > These operations are useful in situations like:
> > 
> > 1. To prevent locking on all subvols during directory creation as part
> > of self heal in dht. Currently we are following approach of locking
> > _all_ subvols by both rmdir and lookup-heal [1].
> > 2. To lock a file in advance so that there is less performance hit
> > during transactions in afr.
> 
> I have an interest in compound/composite procedures too. My use-case is
> a little different, and I (was and still) am planning to send more
> details about it soon.
> 
> Basically, there are certain cases where libgfapi will not be able to
> automatically pass the uid/gid in the RPC-header. A design for
> supporting Kerberos will mainly use the standardized RPCSEC_GSS. If
> there is no option to use the Kerberos credentials of the user doing
> I/O (remote client, not using Kerberos to talk to samba/ganesha), the
> username (or uid/gid) needs to be passed to the storage servers.
> 
> A compound/composite procedure would then look like this:
> 
>   [RPC header]
>     [AUTH_GSS + Kerberos principal for libgfapi/samba/ganesha/...]
> 
>   [GlusterFS COMPOUND]
>     [SETFSUID]
>     [SETLOCKOWNER]
>     [${FOP}]
>     [.. more FOPs?]
> 
> This idea has not been reviewed/commented on with some of the Kerberos
> experts that I want to involve. A more complete description about the
> plans to support Kerberos will follow.
> 
> Do you think that this matches your ideas on compound operations?

The thing we had in mind was more of compounding more than one Gluster fops. We really didn't think at the granularity of setfsuid, setlkowner etc. But, yes its not something fundamentally different from what we had in mind.

> 
> Thanks,
> Niels
> 
> 
> > 
> > While thinking about implementing such compound operations, it
> > occurred to me that one of the problems would be how do we handle a
> > racing mkdir/create and a (named lookup - simply referred as lookup
> > from now on - followed by lock). This is because,
> > 1. creation of directory/file on backend
> > 2. linking of the inode with the gfid corresponding to that
> > file/directory
> > 
> > are not atomic. It is not guaranteed that inode passed down during
> > mkdir/create call need not be the one that survives in inode table.
> > Since posix-locks xlator maintains all the lock-state in inode, it
> > would be a problem if a different inode is linked in inode table than
> > the one passed during mkdir/create. One way to solve this problem is
> > to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink)
> > that are happening on a particular dentry. This serialization would
> > also solve other bugs like:
> > 
> > 1. issues solved by [2][3] and possibly many such issues.
> > 2. Stale dentries left out in bricks' inode table because of a racing
> > lookup and dentry modification ops (like rmdir, unlink, rename etc).
> > 
> > Initial idea I've now is to maintain fops in-progress on a dentry in
> > parent inode (may be resolver code in protocol/server). Based on this
> > we can serialize the operations. Since we need to serialize _only_
> > operations on a dentry (we don't serialize nameless lookups), it is
> > guaranteed that we do have a parent inode always. Any
> > comments/discussion on this would be appreciated.
> > 
> > [1] http://review.gluster.org/11725
> > [2] http://review.gluster.org/9913
> > [3] http://review.gluster.org/5240
> > 
> > regards,
> > Raghavendra.
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-devel mailing list