[Gluster-devel] Posix lock migration design
Susant Palai
spalai at redhat.com
Thu Mar 10 07:00:15 UTC 2016
Forgot to detail about the problems/race. Details inline!
----- Original Message -----
> From: "Susant Palai" <spalai at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, 10 March, 2016 11:35:53 AM
> Subject: Re: [Gluster-devel] Posix lock migration design
>
> Update:
>
> Here is an initial solution on how to solve races between (fd-migration with
> lock migration) and (client-disconnect with lock migration).
>
> Please give your suggestion and comments.
>
> Fuse fd migration:
> -------------------
> part1: Fuse fd migration with out fd association
> - How it is working currently:
> - Fuse initiate fd migration task from graph1 to graph2
> - As part of this a new fd is opened on graph2
> - Locks are associated with fd and client (connection id) currently.
> With fd association out of the picture there will be just new client
> update to the locks.
>
> part2: fd migration interaction with lock migration
Problem: The problem here is the locks on the destination need to have new client-id information from the new graph.
Race: 1: getlkinfo as part of lock migration read the locks which has old client-id.
2: Post this the fd-migration does an update to lock with the new client-id but on the source.
This can lead to problem where the locks on destination has old client id.
It has two part problems.
- 1. When old graph disconnects, the migrated locks on destination will be flushed (client-id or connection-d is same per graph across protocol clients).
- 2. These locks on destination post fd migration should be flushed when the new client disconnects. But this won't happen as the locks are associated with old client id which will lead to stale locks and hang.
<note: locks are cleaned based on client id on a client disconnect>
So the solution is to wait till lock migration happens. This synchronization will help placing the new client id properly.
> - As part of "fuse-fd-lock-migration" we do two operations.
> a. getxattr (lockinfo): Get the old fd number on old graph
> b. setxattr (lockinfo): set (new fd number + new client info) on
> the new graph through the new fd
> - So meta-lock acts as a hint of lock migration for any lock related
> operations (fd-migraiton, server_connection_cleanup, new lk requests,
> flush etc...)
> - Now getxattr need not worry about metalk presence at all. Once it
> reads the necessary information the bulk of the job is left to
> setxattr.
> - Setxattr:
> - case 1: whether meta lock is present
> - if YES, wait till meta-unlock is executed on the lock.
> Unwind the call with EREMOTE. Now it's dht translator's
> responsibility to lookup the file to figure out the file
> location and redirect the setxattr. So destination will have
> the new graph client-id.
> - if NO, set new client information. Which will be migrated
> by rebalance.
> - case 2: What if setxattr has missed (meta lock + unlock)
> - Meta-unlock upon successful lock migration will set a REPLAY
> flag. Which indicates the data as well as locks have been
> migrated.
> - So unwind with EREMOTE, and leave it to dht for the
> redirection part.
>
> <Question: Until fd migration has not happened we operate through old fd. Yes
> ->>> right?>
>
> client talking to source disconnects during lock migration:
> -------------------------------------------------------------
> - There are many phases of data+lock_migraiton. The following describes
> disconnect around all the phases.
Problem: Post lock migration the locks will become stale, if application fd_close does not go on to the destination.
- So rebalance might have transferred the locks before a fd_close reaches the source. And they will be stale on destination and can lead to hang.
Hence, synchronization is essential.
> phase-1: disconnect before data migration
> - server cleanup will flush the locks. Hence, there are no locks left for
> migraiton.
>
> phase-2: disconnect before meatlk reaches server
> - same case as phase-1
>
> phase-3: disconnect just after metalk
> - server_cleanup on seeing metalk waits till meta-unlock.
> - flush the locks on source.
> - incoming ops (write/lk) well will fail with ENOTCONN.
> - fd_close on ENOTCONN will refresh it's inode to check whether the file has
> migrated else where and flush the locks
>
>
>
> Thanks,
> Susant
>
>
>
> ----- Original Message -----
> > From: "Susant Palai" <spalai at redhat.com>
> > To: "Gluster Devel" <gluster-devel at gluster.org>
> > Sent: Thursday, 3 March, 2016 3:09:06 PM
> > Subject: Re: [Gluster-devel] Posix lock migration design
> >
> > Update on Lock migration design.
> >
> > For lock migration we are planning to get rid of fd association with the
> > lock. Rather we will base our lock operations
> > based on lk-owner(equivalent of pid) which is POSIX standard. The fd
> > association does not suit the need of lock migration
> > as migrated fd will not be valid on the destination. Where as Working with
> > lk-owner is much flexible as it does not change
> > across different server.
> >
> > The current model of posix lock infrastructure associate fd with lock for
> > the
> > following operations which we are planning to
> > replace with lk-owner.
> >
> > 1) lock cleanup for protocol client disconnects based on fd
> >
> > 2) release call on fd
> >
> > 3) fuse fd migration (triggered by a graph switch)
> >
> > The new design being worked out and will update here once ready.
> >
> > Please post your suggestions/comments here :)
> >
> > Thanks,
> > Susant
> >
> > ----- Original Message -----
> > > From: "Raghavendra G" <raghavendra at gluster.com>
> > > To: "Susant Palai" <spalai at redhat.com>
> > > Cc: "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Gluster Devel"
> > > <gluster-devel at gluster.org>
> > > Sent: Tuesday, 1 March, 2016 11:40:54 AM
> > > Subject: Re: [Gluster-devel] Posix lock migration design
> > >
> > > On Mon, Feb 29, 2016 at 12:52 PM, Susant Palai <spalai at redhat.com> wrote:
> > >
> > > > Hi Raghavendra,
> > > > I have a question on the design.
> > > >
> > > > Currently in case of a client disconnection, pl_flush cleans up the
> > > > locks associated with the fd created from that client.
> > > > From the design, rebalance will migrate the locks to the new
> > > > destination.
> > > > Now in case client gets disconnected from the
> > > > destination brick, how it is supposed to clean up the locks as
> > > > rebalance/brick have no idea whether the client has opened
> > > > an fd on destination and what the fd is.
> > > >
> > >
> > > > So the question is how to associate the client fd with locks on
> > > > destination.
> > > >
> > >
> > > We don't use fds to cleanup the locks during flush. We use lk-owner which
> > > doesn't change across migration. Note that lk-owner for posix-locks is
> > > filled by the vfs/kernel where we've glusterfs mount.
> > >
> > > <pl_flush>
> > > pthread_mutex_lock (&pl_inode->mutex);
> > > {
> > > __delete_locks_of_owner (pl_inode, frame->root->client,
> > > &frame->root->lk_owner);
> > > }
> > > pthread_mutex_unlock (&pl_inode->mutex);
> > > </pl_flush>
> > >
> > >
> > > > Thanks,
> > > > Susant
> > > >
> > > > ----- Original Message -----
> > > > From: "Susant Palai" <spalai at redhat.com>
> > > > To: "Gluster Devel" <gluster-devel at gluster.org>
> > > > Sent: Friday, 29 January, 2016 3:15:14 PM
> > > > Subject: [Gluster-devel] Posix lock migration design
> > > >
> > > > Hi,
> > > > Here, [1]
> > > >
> > > > https://docs.google.com/document/d/17SZAKxx5mhM-cY5hdE4qRq9icmFqy3LBaTdewofOXYc/edit?usp=sharing
> > > > is a google document about proposal for "POSIX_LOCK_MIGRATION". Problem
> > > > statement and design are explained in the document it self.
> > > >
> > > > Requesting the devel list to go through the document and
> > > > comment/analyze/suggest, to take the thoughts forward (either on the
> > > > google doc itself or here on the devel list).
> > > >
> > > >
> > > > Thanks,
> > > > Susant
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > >
> > >
> > >
> > > --
> > > Raghavendra G
> > >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list