[Gluster-devel] Rebalance data migration and corruption
Joe Julian
joe at julianfamily.org
Mon Feb 8 06:50:27 UTC 2016
Is this in current release versions?
On 02/07/2016 07:43 PM, Shyam wrote:
> On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>> To: "Sakshi Bansal" <sabansal at redhat.com>, "Susant Palai"
>>> <spalai at redhat.com>
>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Nithya
>>> Balachandran" <nbalacha at redhat.com>, "Shyamsundar
>>> Ranganathan" <srangana at redhat.com>
>>> Sent: Friday, February 5, 2016 4:32:40 PM
>>> Subject: Re: Rebalance data migration and corruption
>>>
>>> +gluster-devel
>>>
>>>>
>>>> Hi Sakshi/Susant,
>>>>
>>>> - There is a data corruption issue in migration code. Rebalance
>>>> process,
>>>> 1. Reads data from src
>>>> 2. Writes (say w1) it to dst
>>>>
>>>> However, 1 and 2 are not atomic, so another write (say w2) to
>>>> same region
>>>> can happen between 1. But these two writes can reach dst in the
>>>> order
>>>> (w2,
>>>> w1) resulting in a subtle corruption. This issue is not fixed
>>>> yet and can
>>>> cause subtle data corruptions. The fix is simple and involves
>>>> rebalance
>>>> process acquiring a mandatory lock to make 1 and 2 atomic.
>>>
>>> We can make use of compound fop framework to make sure we don't
>>> suffer a
>>> significant performance hit. Following will be the sequence of
>>> operations
>>> done by rebalance process:
>>>
>>> 1. issues a compound (mandatory lock, read) operation on src.
>>> 2. writes this data to dst.
>>> 3. issues unlock of lock acquired in 1.
>>>
>>> Please co-ordinate with Anuradha for implementation of this compound
>>> fop.
>>>
>>> Following are the issues I see with this approach:
>>> 1. features/locks provides mandatory lock functionality only for
>>> posix-locks
>>> (flock and fcntl based locks). So, mandatory locks will be
>>> posix-locks which
>>> will conflict with locks held by application. So, if an application
>>> has held
>>> an fcntl/flock, migration cannot proceed.
>>
>> We can implement a "special" domain for mandatory internal locks.
>> These locks will behave similar to posix mandatory locks in that
>> conflicting fops (like write, read) are blocked/failed if they are
>> done while a lock is held.
>>
>>> 2. data migration will be less efficient because of an extra unlock
>>> (with
>>> compound lock + read) or extra lock and unlock (for non-compound fop
>>> based
>>> implementation) for every read it does from src.
>>
>> Can we use delegations here? Rebalance process can acquire a
>> mandatory-write-delegation (an exclusive lock with a functionality
>> that delegation is recalled when a write operation happens). In that
>> case rebalance process, can do something like:
>>
>> 1. Acquire a read delegation for entire file.
>> 2. Migrate the entire file.
>> 3. Remove/unlock/give-back the delegation it has acquired.
>>
>> If a recall is issued from brick (when a write happens from mount),
>> it completes the current write to dst (or throws away the read from
>> src) to maintain atomicity. Before doing next set of (read, src) and
>> (write, dst) tries to reacquire lock.
>
> With delegations this simplifies the normal path, when a file is
> exclusively handled by rebalance. It also improves the case where a
> client and rebalance are conflicting on a file, to degrade to
> mandatory locks by either parties.
>
> I would prefer we take the delegation route for such needs in the future.
>
>>
>> @Soumyak, can something like this be done with delegations?
>>
>> @Pranith,
>> Afr does transactions for writing to its subvols. Can you suggest any
>> optimizations here so that rebalance process can have a transaction
>> for (read, src) and (write, dst) with minimal performance overhead?
>>
>> regards,
>> Raghavendra.
>>
>>>
>>> Comments?
>>>
>>>>
>>>> regards,
>>>> Raghavendra.
>>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list