[Gluster-devel] [Gluster-users] Phasing out replace-brick for data migration in favor of remove-brick.
Cool
coolbsd at hotmail.com
Mon Sep 30 17:34:06 UTC 2013
Nice, thanks for the clarification.
-C.B.
On 9/30/2013 2:46 AM, Amar Tumballi wrote:
> On 09/28/2013 12:03 AM, Cool wrote:
>> How does the new command set achieve this?
>>
>> old layout (2x2):
>> rep=2: h1:/b1 h2:/b1 h1:/b2 h2:/b2
>>
>> new layout (3x2):
>> rep=2: h1:/b1 h2:/b1 h1:/b2 h3:/b1 h2:/b2 h3:/b2
>>
>> purpose for the new layout is to make sure there is no SOF, as I
>> cannot simple add h3:/b1 and h3:/b2 as a pair.
>>
>> With replace-brick it pretty straightforward, but without that ...
>> should I remove-brick h2:/b2 then add-brick h3:/b1? this means I'm
>> going to have only one copy for some data for a certain period of
>> time, which makes me feel nervous. Or, should I add-brick h3:/b1
>> first? That doesn't seems to be reasonable either.
>>
>> Or am I the only one hitting this kind of upgrade?
>>
> No, you are not only one. This is the exact reason, we recommend
> adding nodes in multiple of 2s.
>
> Also, another recommendation is having directories exported and not
> the mountpoint itself for bricks.
>
> In your case, it would be (by following above best practice)
>
> # gluster volume info test-vol:
> rep=2: h1:/b1/d1 h2:/b1/d1 h1:/b2/d1 h2:/b2/d1
>
> # gluster volume add-brick test-vol h1:/b2/d2 h3:/b1/d1 h2:/b2/d2
> h3:/b2/d1
> # gluster volume remove-brick test-vol h1:/b2/d1 h2:/b2/d1 start
>
> # gluster volume remove-brick test-vol h1:/b2/d1 h2:/b2/d1 commit
>
> # gluster volume info test-vol:
> rep=2: h1:/b1/d1 h2:/b1/d1 h1:/b2/d2 h3:/b1/d1 h2:/b2/d2 h3:/b2/d1
>
> Hope this works.
>
> Regards,
> Amar
>> -C.B.
>>
>> On 9/27/2013 10:15 AM, Amar Tumballi wrote:
>>>
>>> Hello all,
>>> DHT's remove-brick + rebalance has been enhanced in the last
>>> couple of releases to be quite sophisticated. It can handle
>>> graceful decommissioning of bricks, including open file
>>> descriptors and hard links.
>>>
>>>
>>> Last set of patches for this should be reviewed and accepted before
>>> we make that claim :-) [ http://review.gluster.org/5891 ]
>>>
>>> This in a way is a feature overlap with replace-brick's data
>>> migration functionality. Replace-brick's data migration is
>>> currently also used for planned decommissioning of a brick.
>>>
>>> Reasons to remove replace-brick (or why remove-brick is better):
>>>
>>> - There are two methods of moving data. It is confusing for the
>>> users and hard for developers to maintain.
>>>
>>> - If server being replaced is a member of a replica set, neither
>>> remove-brick nor replace-brick data migration is necessary,
>>> because self-healing itself will recreate the data (replace-brick
>>> actually uses self-heal internally)
>>>
>>> - In a non-replicated config if a server is getting replaced by a
>>> new one, add-brick <new> + remove-brick <old> "start" achieves
>>> the same goal as replace-brick <old> <new> "start".
>>>
>>>
>>> Should we phase out CLI of doing a 'remove-brick' without any option
>>> too? because even if users do it by mistake, they would loose data.
>>> We should enforce 'start' and then 'commit' usage of remove-brick.
>>> Also if old method is required for anyone, they anyways have 'force'
>>> option.
>>>
>>> - In a non-replicated config, <replace-brick> is NOT glitch free
>>> (applications witness ENOTCONN if they are accessing data)
>>> whereas add-brick <new> + remove-brick <old> is completely
>>> transparent.
>>>
>>>
>>> +10 (thats the number of bugs open on these things :-)
>>>
>>> - Replace brick strictly requires a server with enough free space
>>> to hold the data of the old brick, whereas remove-brick will
>>> evenly spread out the data of the bring being removed amongst the
>>> remaining servers.
>>>
>>> - Replace-brick code is complex and messy (the real reason :p).
>>>
>>>
>>> Wanted to see this reason as 1st point, but its ok as long as we
>>> mention about this. I too agree that its _hard_ to maintain that
>>> piece of code.
>>>
>>> - No clear reason why replace-brick's data migration is better in
>>> any way to remove-brick's data migration.
>>>
>>>
>>> One reason I heard when I sent the mail on gluster-devel earlier
>>> (http://lists.nongnu.org/archive/html/gluster-devel/2012-10/msg00050.html
>>> ) was that the remove-brick way was bit slower than that of
>>> replace-brick. Technical reason being remove-brick does DHT's
>>> readdir, where as replace-brick does the brick level readdir.
>>>
>>> I plan to send out patches to remove all traces of replace-brick
>>> data migration code by 3.5 branch time.
>>>
>>> Thanks for the initiative, let me know if you need help.
>>>
>>> NOTE that replace-brick command itself will still exist, and you
>>> can replace on server with another in case a server dies. It is
>>> only the data migration functionality being phased out.
>>>
>>>
>>> Yes, we need to be careful about this. We would need 'replace-brick'
>>> to phase out a dead brick. The other day, there was some discussion
>>> on have 'gluster peer replace <old-peer> <new-peer>' which would
>>> re-write all the vol files properly. But thats mostly for 3.6 time
>>> frame IMO.
>>>
>>> Please do ask any questions / raise concerns at this stage :)
>>>
>>>
>>> What is the window before you start sending out patches ?? I see
>>> http://review.gluster.org/6010 which I guess is not totally complete
>>> without phasing out pump xlator :-)
>>>
>>> I personally am all in for this change, as it helps me to finish few
>>> more enhancements I am working on like 'discover()' changes etc...
>>>
>>> Regards,
>>> Amar
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
More information about the Gluster-devel
mailing list