[Gluster-devel] Rebalance data migration and corruption
rgowdapp at redhat.com
Fri Feb 5 11:02:40 UTC 2016
> Hi Sakshi/Susant,
> - There is a data corruption issue in migration code. Rebalance process,
> 1. Reads data from src
> 2. Writes (say w1) it to dst
> However, 1 and 2 are not atomic, so another write (say w2) to same region
> can happen between 1. But these two writes can reach dst in the order (w2,
> w1) resulting in a subtle corruption. This issue is not fixed yet and can
> cause subtle data corruptions. The fix is simple and involves rebalance
> process acquiring a mandatory lock to make 1 and 2 atomic.
We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process:
1. issues a compound (mandatory lock, read) operation on src.
2. writes this data to dst.
3. issues unlock of lock acquired in 1.
Please co-ordinate with Anuradha for implementation of this compound fop.
Following are the issues I see with this approach:
1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed.
2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src.
More information about the Gluster-devel