[Gluster-devel] Rebalance data migration and corruption

Joe Julian joe at julianfamily.org
Mon Feb 8 15:38:45 UTC 2016

On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote:
> Yes. This bug is present in currently released versions. However, it can happen only if writes from application are happening to a file when it is being migrated. So, vaguely one can say probability is less.

Probability is quite high when the volume is used for VM images, which 
many are.

>>>>>> Hi Sakshi/Susant,
>>>>>> - There is a data corruption issue in migration code. Rebalance
>>>>>> process,
>>>>>>     1. Reads data from src
>>>>>>     2. Writes (say w1) it to dst
>>>>>>     However, 1 and 2 are not atomic, so another write (say w2) to
>>>>>> same region
>>>>>>     can happen between 1. But these two writes can reach dst in the
>>>>>> order
>>>>>>     (w2,
>>>>>>     w1) resulting in a subtle corruption. This issue is not fixed
>>>>>> yet and can
>>>>>>     cause subtle data corruptions. The fix is simple and involves
>>>>>> rebalance
>>>>>>     process acquiring a mandatory lock to make 1 and 2 atomic.
>>>>> We can make use of compound fop framework to make sure we don't
>>>>> suffer a
>>>>> significant performance hit. Following will be the sequence of
>>>>> operations
>>>>> done by rebalance process:
>>>>> 1. issues a compound (mandatory lock, read) operation on src.
>>>>> 2. writes this data to dst.
>>>>> 3. issues unlock of lock acquired in 1.
>>>>> Please co-ordinate with Anuradha for implementation of this compound
>>>>> fop.
>>>>> Following are the issues I see with this approach:
>>>>> 1. features/locks provides mandatory lock functionality only for
>>>>> posix-locks
>>>>> (flock and fcntl based locks). So, mandatory locks will be
>>>>> posix-locks which
>>>>> will conflict with locks held by application. So, if an application
>>>>> has held
>>>>> an fcntl/flock, migration cannot proceed.
>>>> We can implement a "special" domain for mandatory internal locks.
>>>> These locks will behave similar to posix mandatory locks in that
>>>> conflicting fops (like write, read) are blocked/failed if they are
>>>> done while a lock is held.
>>>>> 2. data migration will be less efficient because of an extra unlock
>>>>> (with
>>>>> compound lock + read) or extra lock and unlock (for non-compound fop
>>>>> based
>>>>> implementation) for every read it does from src.
>>>> Can we use delegations here? Rebalance process can acquire a
>>>> mandatory-write-delegation (an exclusive lock with a functionality
>>>> that delegation is recalled when a write operation happens). In that
>>>> case rebalance process, can do something like:
>>>> 1. Acquire a read delegation for entire file.
>>>> 2. Migrate the entire file.
>>>> 3. Remove/unlock/give-back the delegation it has acquired.
>>>> If a recall is issued from brick (when a write happens from mount),
>>>> it completes the current write to dst (or throws away the read from
>>>> src) to maintain atomicity. Before doing next set of (read, src) and
>>>> (write, dst) tries to reacquire lock.
>>> With delegations this simplifies the normal path, when a file is
>>> exclusively handled by rebalance. It also improves the case where a
>>> client and rebalance are conflicting on a file, to degrade to
>>> mandatory locks by either parties.
>>> I would prefer we take the delegation route for such needs in the future.
>>>> @Soumyak, can something like this be done with delegations?
>>>> @Pranith,
>>>> Afr does transactions for writing to its subvols. Can you suggest any
>>>> optimizations here so that rebalance process can have a transaction
>>>> for (read, src) and (write, dst) with minimal performance overhead?
>>>> regards,
>>>> Raghavendra.
>>>>> Comments?
>>>>>> regards,
>>>>>> Raghavendra.
