[Gluster-users] [Gluster-devel] Phasing out replace-brick for data migration in favor of remove-brick.

Amar Tumballi atumball at redhat.com
Mon Sep 30 09:26:22 UTC 2013

Inline response.

On 09/27/2013 02:26 PM, James wrote:
> On Fri, 2013-09-27 at 00:35 -0700, Anand Avati wrote:
>> Hello all,
> Hey,
> Interesting timing for this post...
> I've actually started working on automatic brick addition/removal. (I'm
> planning to add this to puppet-gluster of course.) I was hoping you
> could help out with the algorithm. I think it's a bit different if
> there's no replace-brick command as you are proposing.
> Here's the problem:
> Given a logically optimal initial volume:
> volA: rep=2; h1:/b1 h2:/b1 h3:/b1 h4:/b1 h1:/b2 h2:/b2 h3:/b2 h4:/b2
> suppose I know that I want to add/remove bricks such that my new volume
> (if I had created it new) looks like:
> volB: rep=2; h1:/b1 h3:/b1 h4:/b1 h5:/b1 h6:/b1 h1:/b2 h3:/b2 h4:/b2
> h5:/b2 h6:/b2
> What is the optimal algorithm for determining the correct sequence of
> transforms that are needed to accomplish this task. Obviously there are
> some simpler corner cases, but I'd like to solve the general case.
> The transforms are obviously things like running the add-brick {...} and
> remove-brick {...} commands.

This is the exact reason why we recommend in our best practice to have a 
directory inside a mountpoint exported as a brick, in this case, 
h1:/b1/d1 (where d1 is a directory inside mountpoint /b1).

This helps in having a brick h1:/b1/d2 which is technically the same 
thing you would like to have in VolB.

Also, it is never good to swap/change/move replica pairs to different 
sets... would lead into many issues, like duplicate files, etc etc..

>> - Replace brick strictly requires a server with enough free space to hold
>> the data of the old brick, whereas remove-brick will evenly spread out the
>> data of the bring being removed amongst the remaining servers.
> Can you talk more about the replica = N case (where N is 2 or 3?)
> With remove brick, add brick you will need add/remove N (replica count)
> bricks at a time, right? With replace brick, you could just swap out
> one, right? Isn't that a missing feature if you remove replace brick?
For that particular swapping without data migration, you will still have 
'replace-brick' existing. What it does is replace an existing brick of a 
replica pair with an empty brick, so replicate's self-heal daemon 
populates the data in it.

>> Please do ask any questions / raise concerns at this stage :)
> I heard with 3.4 you can somehow change the replica count when adding
> new bricks... What's the full story here please?
Yes, support in CLI for this existed with glusterfs-3.3.x 
(http://review.gluster.com/158) itself, just that there are few bugs.

syntax of add-brick:

gluster volume add-brick <VOLNAME> [<stripe|replica> <COUNT>] 
<NEW-BRICK> ... [force] - add brick to volume <VOLNAME>

If you give 'replica N' where N is already existing replica count -1/+1.


More information about the Gluster-users mailing list