[Gluster-devel] Rebalance data migration and corruption

Raghavendra Gowdappa rgowdapp at redhat.com
Wed Feb 10 10:13:14 UTC 2016


> >
> > hmm.. I would prefer an infinite timeout. The only scenario where brick
> > process can forcefully flush leases would be connection lose with
> > rebalance process. The more scenarios where brick can flush leases
> > without knowledge of rebalance process, we open up more race-windows for
> > this bug to occur.
> >
> > In fact at least in theory to be correct, rebalance process should
> > replay all the transactions that happened during the lease which got
> > flushed out by brick (after re-acquiring that lease). So, we would like
> > to avoid any such scenarios.
> >
> > Btw, what is the necessity of timeouts? Is it an insurance against rogue
> > clients who won't respond back to lease recalls?
> yes. It is to protect from rogue clients and prevent starvation of other
> clients.
> 
> In the current design, every lease is associated with lease-id (like
> lockowner in case of locks) and all the further fops (I/Os) have to be
> done using this lease-id. So in case if any fop comes to brick process
> with the lease-id of the lease which got flushed by the brick process,
> we can send special error and rebalance process can then replay all
> those fops. Will that be sufficient?

How do I pass lease-id in a fop like readv? Should I pass it in xdata? This is sufficient for rebalance process. It can follow following algo:

1. Acquire a read-lease on the entire file on src.
2. Note the offset at which this transaction has started. Initially it'll be zero. But if leases were recalled, the offset will be the continuation from where last transaction left off.
3. Do multiple (read, src) and (write, dst).
4. If (read, src) returns an error (because of lease being flushed), Goto step 1 and start the transaction from offset remembered in step 2. Note that we don't update the offset here and we replay this failed transaction again. We update offset only on successful unlock.

On receiving a lease-recall notification from brick, rebalance process does:
1. Note the offset till which it has successfully copied file from src to dst.
2. Make sure atleast one (read, src) and (write, dst) is done since we last acquired the lease (at least a best effort). This will ensure that rebalance process won't get stuck in an infinite loop.
3. Issue an unlock. If unlock is successful, next transaction will continue from offset noted in 1. Else, this transaction is considered a failure and rebalance process behaves exactly the same way as read failed above because of lease expiry.

In this algo, to avoid rebalance process getting stuck in infinite loop, we should make sure unlocks are successful (to the extent they can be made successful). We can also add max-number of retries for transaction on same region of file and fail the migration once we exceed so many retries.

> 
> CCin Poornima who has been implementing it.
> 
> 
> Thanks,
> Soumya
> 


More information about the Gluster-devel mailing list