[Gluster-users] [Gluster-devel] Phasing out replace-brick for data migration in favor of remove-brick.

Cool coolbsd at hotmail.com
Mon Sep 30 17:34:06 UTC 2013


Nice, thanks for the clarification.

-C.B.

On 9/30/2013 2:46 AM, Amar Tumballi wrote:
> On 09/28/2013 12:03 AM, Cool wrote:
>> How does the new command set achieve this?
>>
>> old layout (2x2):
>> rep=2: h1:/b1 h2:/b1 h1:/b2 h2:/b2
>>
>> new layout (3x2):
>> rep=2: h1:/b1 h2:/b1 h1:/b2 h3:/b1 h2:/b2 h3:/b2
>>
>> purpose for the new layout is to make sure there is no SOF, as I 
>> cannot simple add h3:/b1 and h3:/b2 as a pair.
>>
>> With replace-brick it pretty straightforward, but without that ... 
>> should I remove-brick h2:/b2 then add-brick h3:/b1? this means I'm 
>> going to have only one copy for some data for a certain period of 
>> time, which makes me feel nervous. Or, should I add-brick h3:/b1 
>> first? That doesn't seems to be reasonable either.
>>
>> Or am I the only one hitting this kind of upgrade?
>>
> No, you are not only one. This is the exact reason, we recommend 
> adding nodes in multiple of 2s.
>
> Also, another recommendation is having directories exported and not 
> the mountpoint itself for bricks.
>
> In your case, it would be (by following above best practice)
>
> # gluster volume info test-vol:
> rep=2: h1:/b1/d1 h2:/b1/d1 h1:/b2/d1 h2:/b2/d1
>
> # gluster volume add-brick test-vol h1:/b2/d2 h3:/b1/d1 h2:/b2/d2 
> h3:/b2/d1
> # gluster volume remove-brick test-vol h1:/b2/d1 h2:/b2/d1 start
>
> # gluster volume remove-brick test-vol h1:/b2/d1 h2:/b2/d1 commit
>
> # gluster volume info test-vol:
> rep=2: h1:/b1/d1 h2:/b1/d1 h1:/b2/d2 h3:/b1/d1 h2:/b2/d2 h3:/b2/d1
>
> Hope this works.
>
> Regards,
> Amar
>> -C.B.
>>
>> On 9/27/2013 10:15 AM, Amar Tumballi wrote:
>>>
>>>     Hello all,
>>>     DHT's remove-brick + rebalance has been enhanced in the last
>>>     couple of releases to be quite sophisticated. It can handle
>>>     graceful decommissioning of bricks, including open file
>>>     descriptors and hard links.
>>>
>>>
>>> Last set of patches for this should be reviewed and accepted before 
>>> we make that claim :-) [ http://review.gluster.org/5891 ]
>>>
>>>     This in a way is a feature overlap with replace-brick's data
>>>     migration functionality. Replace-brick's data migration is
>>>     currently also used for planned decommissioning of a brick.
>>>
>>>     Reasons to remove replace-brick (or why remove-brick is better):
>>>
>>>     - There are two methods of moving data. It is confusing for the
>>>     users and hard for developers to maintain.
>>>
>>>     - If server being replaced is a member of a replica set, neither
>>>     remove-brick nor replace-brick data migration is necessary,
>>>     because self-healing itself will recreate the data (replace-brick
>>>     actually uses self-heal internally)
>>>
>>>     - In a non-replicated config if a server is getting replaced by a
>>>     new one, add-brick <new> + remove-brick <old> "start" achieves
>>>     the same goal as replace-brick <old> <new> "start".
>>>
>>>
>>> Should we phase out CLI of doing a 'remove-brick' without any option 
>>> too? because even if users do it by mistake, they would loose data. 
>>> We should enforce 'start' and then 'commit' usage of remove-brick. 
>>> Also if old method is required for anyone, they anyways have 'force' 
>>> option.
>>>
>>>     - In a non-replicated config, <replace-brick> is NOT glitch free
>>>     (applications witness ENOTCONN if they are accessing data)
>>>     whereas add-brick <new> + remove-brick <old> is completely
>>>     transparent.
>>>
>>>
>>> +10 (thats the number of bugs open on these things :-)
>>>
>>>     - Replace brick strictly requires a server with enough free space
>>>     to hold the data of the old brick, whereas remove-brick will
>>>     evenly spread out the data of the bring being removed amongst the
>>>     remaining servers.
>>>
>>>     - Replace-brick code is complex and messy (the real reason :p).
>>>
>>>
>>> Wanted to see this reason as 1st point, but its ok as long as we 
>>> mention about this. I too agree that its _hard_ to maintain that 
>>> piece of code.
>>>
>>>     - No clear reason why replace-brick's data migration is better in
>>>     any way to remove-brick's data migration.
>>>
>>>
>>> One reason I heard when I sent the mail on gluster-devel earlier 
>>> (http://lists.nongnu.org/archive/html/gluster-devel/2012-10/msg00050.html 
>>> ) was that the remove-brick way was bit slower than that of 
>>> replace-brick. Technical reason being remove-brick does DHT's 
>>> readdir, where as replace-brick does the brick level readdir.
>>>
>>>     I plan to send out patches to remove all traces of replace-brick
>>>     data migration code by 3.5 branch time.
>>>
>>> Thanks for the initiative, let me know if you need help.
>>>
>>>     NOTE that replace-brick command itself will still exist, and you
>>>     can replace on server with another in case a server dies. It is
>>>     only the data migration functionality being phased out.
>>>
>>>
>>> Yes, we need to be careful about this. We would need 'replace-brick' 
>>> to phase out a dead brick. The other day, there was some discussion 
>>> on have 'gluster peer replace <old-peer> <new-peer>' which would 
>>> re-write all the vol files properly. But thats mostly for 3.6 time 
>>> frame IMO.
>>>
>>>     Please do ask any questions / raise concerns at this stage :)
>>>
>>>
>>> What is the window before you start sending out patches ?? I see 
>>> http://review.gluster.org/6010 which I guess is not totally complete 
>>> without phasing out pump xlator :-)
>>>
>>> I personally am all in for this change, as it helps me to finish few 
>>> more enhancements I am working on like 'discover()' changes etc...
>>>
>>> Regards,
>>> Amar
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>




More information about the Gluster-users mailing list