[Gluster-devel] Rebalance data migration and corruption

Sat Feb 6 13:06:58 UTC 2016

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Sakshi Bansal" <sabansal at redhat.com>, "Susant Palai" <spalai at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Nithya Balachandran" <nbalacha at redhat.com>, "Shyamsundar
> Ranganathan" <srangana at redhat.com>
> Sent: Friday, February 5, 2016 4:32:40 PM
> Subject: Re: Rebalance data migration and corruption
> 
> +gluster-devel
> 
> > 
> > Hi Sakshi/Susant,
> > 
> > - There is a data corruption issue in migration code. Rebalance process,
> >   1. Reads data from src
> >   2. Writes (say w1) it to dst
> > 
> >   However, 1 and 2 are not atomic, so another write (say w2) to same region
> >   can happen between 1. But these two writes can reach dst in the order
> >   (w2,
> >   w1) resulting in a subtle corruption. This issue is not fixed yet and can
> >   cause subtle data corruptions. The fix is simple and involves rebalance
> >   process acquiring a mandatory lock to make 1 and 2 atomic.
> 
> We can make use of compound fop framework to make sure we don't suffer a
> significant performance hit. Following will be the sequence of operations
> done by rebalance process:
> 
> 1. issues a compound (mandatory lock, read) operation on src.
> 2. writes this data to dst.
> 3. issues unlock of lock acquired in 1.
> 
> Please co-ordinate with Anuradha for implementation of this compound fop.
> 
> Following are the issues I see with this approach:
> 1. features/locks provides mandatory lock functionality only for posix-locks
> (flock and fcntl based locks). So, mandatory locks will be posix-locks which
> will conflict with locks held by application. So, if an application has held
> an fcntl/flock, migration cannot proceed.

We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held.

> 2. data migration will be less efficient because of an extra unlock (with
> compound lock + read) or extra lock and unlock (for non-compound fop based
> implementation) for every read it does from src.

Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like:

1. Acquire a read delegation for entire file.
2. Migrate the entire file.
3. Remove/unlock/give-back the delegation it has acquired.

If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock.

@Soumyak, can something like this be done with delegations?

@Pranith,
Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead?

regards,
Raghavendra.

> 
> Comments?
> 
> > 
> > regards,
> > Raghavendra.
>