[Gluster-devel] Rebalance data migration and corruption

Fri Feb 5 11:02:40 UTC 2016

+gluster-devel

> 
> Hi Sakshi/Susant,
> 
> - There is a data corruption issue in migration code. Rebalance process,
>   1. Reads data from src
>   2. Writes (say w1) it to dst
> 
>   However, 1 and 2 are not atomic, so another write (say w2) to same region
>   can happen between 1. But these two writes can reach dst in the order (w2,
>   w1) resulting in a subtle corruption. This issue is not fixed yet and can
>   cause subtle data corruptions. The fix is simple and involves rebalance
>   process acquiring a mandatory lock to make 1 and 2 atomic.

We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process:

1. issues a compound (mandatory lock, read) operation on src.
2. writes this data to dst.
3. issues unlock of lock acquired in 1.

Please co-ordinate with Anuradha for implementation of this compound fop.

Following are the issues I see with this approach:
1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed.
2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src.

Comments?

> 
> regards,
> Raghavendra.