[Gluster-devel] Posix lock migration design

Susant Palai spalai at redhat.com
Thu Mar 10 06:05:53 UTC 2016


Update:

Here is an initial solution on how to solve races between (fd-migration with lock migration) and (client-disconnect with lock migration).

Please give your suggestion and comments.

Fuse fd migration:
-------------------
    part1:  Fuse fd migration with out fd association
        - How it is working currently:
        - Fuse initiate fd migration task from graph1 to graph2
        - As part of this a new fd is opened on graph2
        - Locks are associated with fd and client (connection id) currently. With fd association out of the picture there will be just new client update to the locks.

    part2: fd migration interaction with lock migration
       - As part of "fuse-fd-lock-migration" we do two operations.
           a. getxattr (lockinfo): Get the old fd number on old graph
           b. setxattr (lockinfo): set (new fd number + new client info) on the new graph through the new fd
       - So meta-lock acts as  a hint of lock migration for any lock related operations (fd-migraiton, server_connection_cleanup, new lk requests, flush etc...)
       - Now getxattr need not worry about metalk presence at all. Once it reads the necessary information the bulk of the job is left to setxattr.
       - Setxattr: 
           - case 1: whether meta lock is present
               - if YES, wait till meta-unlock is executed on the lock. Unwind the call with EREMOTE. Now it's dht translator's responsibility to lookup the file to figure out the file location and redirect the setxattr. So destination will have the new graph client-id.
               - if NO,  set new client information. Which will be migrated by rebalance.
           - case 2: What if setxattr has missed (meta lock + unlock)
               - Meta-unlock upon successful lock migration will set a REPLAY flag. Which indicates the data as well as locks have been migrated. 
               - So unwind with EREMOTE, and leave it to dht for the redirection part.

<Question: Until fd migration has not happened we operate through old fd. Yes ->>> right?>

client talking to source disconnects  during lock migration:
-------------------------------------------------------------
- There are many phases of data+lock_migraiton.  The following describes disconnect around all the phases.

phase-1: disconnect before data migration
- server cleanup will flush the locks. Hence, there are no locks left for migraiton.

phase-2: disconnect before meatlk reaches server
- same case as phase-1

phase-3: disconnect just after metalk
- server_cleanup on seeing metalk waits till meta-unlock.
- flush the locks on source. 
- incoming ops (write/lk) well will fail with ENOTCONN.
- fd_close on ENOTCONN will refresh it's inode to check whether the file has migrated else where and flush the locks



Thanks,
Susant



----- Original Message -----
> From: "Susant Palai" <spalai at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, 3 March, 2016 3:09:06 PM
> Subject: Re: [Gluster-devel] Posix lock migration design
> 
> Update on Lock migration design.
> 
> For lock migration we are planning to get rid of fd association with the
> lock. Rather we will base our lock operations
> based on lk-owner(equivalent of pid) which is POSIX standard. The fd
> association does not suit the need of lock migration
> as migrated fd will not be valid on the destination. Where as Working with
> lk-owner is much flexible as it does not change
> across different server.
> 
> The current model of posix lock infrastructure associate fd with lock for the
> following operations which we are planning to
> replace with lk-owner.
> 
> 1) lock cleanup for protocol client disconnects based on fd
> 
> 2) release call on fd
> 
> 3) fuse fd migration (triggered by a graph switch)
> 
> The new design being worked out and will update here once ready.
> 
> Please post your suggestions/comments here :)
> 
> Thanks,
> Susant
> 
> ----- Original Message -----
> > From: "Raghavendra G" <raghavendra at gluster.com>
> > To: "Susant Palai" <spalai at redhat.com>
> > Cc: "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Gluster Devel"
> > <gluster-devel at gluster.org>
> > Sent: Tuesday, 1 March, 2016 11:40:54 AM
> > Subject: Re: [Gluster-devel] Posix lock migration design
> > 
> > On Mon, Feb 29, 2016 at 12:52 PM, Susant Palai <spalai at redhat.com> wrote:
> > 
> > > Hi Raghavendra,
> > >    I have a question on the design.
> > >
> > >    Currently in case of a client disconnection, pl_flush cleans up the
> > > locks associated with the fd created from that client.
> > > From the design, rebalance will migrate the locks to the new destination.
> > > Now in case client gets disconnected from the
> > > destination brick, how it is supposed to clean up the locks as
> > > rebalance/brick have no idea whether the client has opened
> > > an fd on destination and what the fd is.
> > >
> > 
> > >    So the question is how to associate the client fd with locks on
> > > destination.
> > >
> > 
> > We don't use fds to cleanup the locks during flush. We use lk-owner which
> > doesn't change across migration. Note that lk-owner for posix-locks is
> > filled by the vfs/kernel where we've glusterfs mount.
> > 
> > <pl_flush>
> >          pthread_mutex_lock (&pl_inode->mutex);
> >         {
> >                 __delete_locks_of_owner (pl_inode, frame->root->client,
> >                                          &frame->root->lk_owner);
> >         }
> >         pthread_mutex_unlock (&pl_inode->mutex);
> > </pl_flush>
> > 
> > 
> > > Thanks,
> > > Susant
> > >
> > > ----- Original Message -----
> > > From: "Susant Palai" <spalai at redhat.com>
> > > To: "Gluster Devel" <gluster-devel at gluster.org>
> > > Sent: Friday, 29 January, 2016 3:15:14 PM
> > > Subject: [Gluster-devel] Posix lock migration design
> > >
> > > Hi,
> > >    Here, [1]
> > >
> > > https://docs.google.com/document/d/17SZAKxx5mhM-cY5hdE4qRq9icmFqy3LBaTdewofOXYc/edit?usp=sharing
> > > is a google document about proposal for "POSIX_LOCK_MIGRATION". Problem
> > > statement and design are explained in the document it self.
> > >
> > >   Requesting the devel list to go through the document and
> > > comment/analyze/suggest, to take the thoughts forward (either on the
> > > google doc itself or here on the devel list).
> > >
> > >
> > > Thanks,
> > > Susant
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > 
> > 
> > 
> > --
> > Raghavendra G
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-devel mailing list