[Gluster-devel] dht renamedir transactions, failures and crash consistency

Thu Nov 3 04:21:42 UTC 2016

+ gluster-devel

----- Original Message -----
> > > 
> > > Hi all,
> > > 
> > > This mail is to consolidate three efforts that are in progress to fix
> > > issues
> > > around renamedir codepath in dht:
> > > 
> > > 1. Transactions by Kotresh [1]- this makes renamedir atomic (barring
> > > failures
> > > and crash consistency issues) wrt ops like mkdir, lookup-heal, rmdir.
> > 
> > Please note that transactions too are inadequate to address
> > crash-consistency/snapshot related issues.
> > 
> > > 
> > > 2. Rollback of renamedirs successfully completed on some subvols in case
> > > of
> > > failed renamedir - Csaba is working on this (patch yet to be posted). The
> > > idea discussed involves dht_renamedir remembering result of renamedir
> > > from
> > > each subvol and rolling back the successful operations in case of
> > > renamedir
> > > failure. Note that this approach won't solve the issues with client
> > > crashing
> > > in the middle of a renamedir or issues with taking snapshots (after
> > > restoring them) while a renamedir is in progress.
> > > 
> > > 3. A proposal by Nithya to fail mkdir during directory self-heal
> > > initiated
> > > by
> > > dht_lookup codepath. This will
> > >    3a. Solve the race between a lookup(src)/lookup (dst) and rename (src,
> > >    dst) (as lookup won't be able to create src/dst).
> > >    3b. Won't worsen the situation by messing up with gfid handles (on
> > >    backend) due to lookup heal creating either src or dst or both after a
> > >    failed renamedir.
> > > 
> > >    However solution 3 is a damage control and won't fix all things with a
> > >    failed renamedir.
> > > 
> > > I think there is quite a bit of dependency among all the three
> > > approaches.
> > > 
> > > Problem 2 has dependency on 1 and 3 as:
> > > 1. lookup heal could've already healed src/dst or both before we try to
> > > roll-back
> > > 2. transactions (by locking out lookup-heal) or proposal 3 (by failing
> > > heal)
> > > make sure that directory namespace is not tampered till a renamedir is
> > > complete and hence paving way for rollback.
> > > 
> > > Also we can build on top of 3 to recover from crashed renamedir or
> > > restored
> > > snapshots in lookup-heal (essentially solution 2, implemented in
> > > lookup-heal
> > > to either rollback/rollforward). My thoughts are below:
> > > 
> > > Once transactions for entry operations corresponding to directory are in
> > > place, lookup-selfheal will be able to identify a failed renamedir
> > > operation
> > > as:
> > > 
> > > 1. It can figure out a gfid has been associated with more than one
> > > directory.
> > > For this, we need to make either mkdir during healing fail with EEXIST if
> > > directory exists - Proposal 3 above (and possibly return the other path
> > > associated with gfid) or do a lookup on gfid and fetch paths associated
> > > with
> > > gfid.
> > > 2. No renamedir is in-progress (as we are in a transaction) and renamedir
> > > is
> > > the only operation (apart from mkdir and rmdir) that changes the
> > > association
> > > b/w a path and gfid for directories.
> > > 
> > > Once we are able to identify a failed renamedir, we can possibly
> > > rollback.
> > > The ambiguous thing here is to figure out whether renamedir was a failure
> > > (client crash scenario) or succeeded (snapshots). Since, for snapshots it
> > > doesn't make a difference whether renamedir succeeded or failed, we can
> > > always assume the case of failure and implement rollback.
> 
> After today's meeting following are the problems with rollback after a crash
> of client doing renamedir (or recovery of a snapshotted volume with
> renamedir in progress):
> 
> 1. Where to put recovery code?
>    The code has to be put in all places which modify the directory path i.e,
>    rmdir, renamedir and lookup-heal. The reason is another client might've
>    already issued a parallel operation and blocked on locks. The moment the
>    client with renamedir in-progress crashes, the other
>    rmdir/renamedir/lookup-heal would get the lock and proceed. So, all these
>    fops should be able to identify a crashed renamedir op and recover from
>    it.
> 
> 2. How to identify src/dst (of crashed renamedir) for rollback?
>    Preferred way to store the src and dst on brick and use that information
>    for rollback. Proposal to see whether JBR helps.
> 
> We decided not go ahead with providing crash consistency for renamedir given
> the above complexity and also relative infrequency of the occurrence of this
> issue. However, if snapshots become popular we may have to revisit the
> problem.
> 
> Other three efforts will be continued.
> 
> > > 
> > > In nutshell 1 and 3 are two relatively independent changes which can be
> > > leveraged by 2.
> > > 
> > > Comments?
> > > 
> > > [1] http://review.gluster.org/15472
> > > 
> > > regards,
> > > Raghavendra
> > > 
> > > 
> > 
> > 
>