[Gluster-devel] Phasing out replace-brick for data migration in favor of remove-brick.

Fri Sep 27 17:15:52 UTC 2013

> Hello all,
> DHT's remove-brick + rebalance has been enhanced in the last couple of
> releases to be quite sophisticated. It can handle graceful decommissioning
> of bricks, including open file descriptors and hard links.
>
>
Last set of patches for this should be reviewed and accepted before we make
that claim :-) [ http://review.gluster.org/5891 ]

> This in a way is a feature overlap with replace-brick's data migration
> functionality. Replace-brick's data migration is currently also used for
> planned decommissioning of a brick.
>
> Reasons to remove replace-brick (or why remove-brick is better):
>
> - There are two methods of moving data. It is confusing for the users and
> hard for developers to maintain.
>
> - If server being replaced is a member of a replica set, neither
> remove-brick nor replace-brick data migration is necessary, because
> self-healing itself will recreate the data (replace-brick actually uses
> self-heal internally)
>
> - In a non-replicated config if a server is getting replaced by a new one,
> add-brick <new> + remove-brick <old> "start" achieves the same goal as
> replace-brick <old> <new> "start".
>
>
Should we phase out CLI of doing a 'remove-brick' without any option too?
because even if users do it by mistake, they would loose data. We should
enforce 'start' and then 'commit' usage of remove-brick. Also if old method
is required for anyone, they anyways have 'force' option.

> - In a non-replicated config, <replace-brick> is NOT glitch free
> (applications witness ENOTCONN if they are accessing data) whereas
> add-brick <new> + remove-brick <old> is completely transparent.
>
>
+10 (thats the number of bugs open on these things :-)

> - Replace brick strictly requires a server with enough free space to hold
> the data of the old brick, whereas remove-brick will evenly spread out the
> data of the bring being removed amongst the remaining servers.
>
> - Replace-brick code is complex and messy (the real reason :p).
>
>
Wanted to see this reason as 1st point, but its ok as long as we mention
about this. I too agree that its _hard_ to maintain that piece of code.

> - No clear reason why replace-brick's data migration is better in any way
> to remove-brick's data migration.
>
>
One reason I heard when I sent the mail on gluster-devel earlier (
http://lists.nongnu.org/archive/html/gluster-devel/2012-10/msg00050.html )
was that the remove-brick way was bit slower than that of replace-brick.
Technical reason being remove-brick does DHT's readdir, where as
replace-brick does the brick level readdir.

> I plan to send out patches to remove all traces of replace-brick data
> migration code by 3.5 branch time.
>
> Thanks for the initiative, let me know if you need help.

> NOTE that replace-brick command itself will still exist, and you can
> replace on server with another in case a server dies. It is only the data
> migration functionality being phased out.
>
>
Yes, we need to be careful about this. We would need 'replace-brick' to
phase out a dead brick. The other day, there was some discussion on have
'gluster peer replace <old-peer> <new-peer>' which would re-write all the
vol files properly. But thats mostly for 3.6 time frame IMO.

> Please do ask any questions / raise concerns at this stage :)
>
>
> What is the window before you start sending out patches ?? I see
http://review.gluster.org/6010 which I guess is not totally complete
without phasing out pump xlator :-)

I personally am all in for this change, as it helps me to finish few more
enhancements I am working on like 'discover()' changes etc...

Regards,
Amar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130927/b546f5ea/attachment-0001.html>