[Gluster-users] gluster rebalance taking multiple days

Michael Robbert mrobbert at mines.edu
Wed Dec 8 16:59:30 UTC 2010

While this looks like good documentation which should probably be added to the wiki it doesn't answer the question I had. I probably didn't ask it very well so that is my fault. My question had more do to with the counter for step 1. During this step it appears that no data was getting transferred. It looks like it was just checking and changing attributes on all the files. The problem that I had was that the counter had exceeded the number of files. I had to look at the source to try to determine that doing so was probably wrong. I'm not a programmer so my deduction could be totally wrong and that is my question, was it a problem that the counter in step 1 was exceeding the number of files?
As it turns out with the help of support and an engineer it was determined that a rebalance was not needed in the first place so I have since stopped the operation. My underlying problem was identified as a bug in the code which as far as I know is unrelated to rebalance.

On Dec 7, 2010, at 11:30 PM, Craig Carl wrote:

> All -
>   It is possible to calculate in advance the number of files that will 
> be moved by a re-balance. By testing performance in advance with some 
> small rsyncs, and the formula below you should be able to get an 
> accurate estimate of the time it will take. Starting in Gluster 3.1 it 
> is possible to stop a re-balance, then restart it where it left off, see -
> volume rebalance <VOLNAME> start - start rebalance of volume <VOLNAME>
> volume rebalance <VOLNAME> stop - stop rebalance of volume <VOLNAME>
> volume rebalance <VOLNAME> status - rebalance status of volume <VOLNAME>
> /Basic Assumptions:-  Distribute equally distributes all the files 
> across all the nodes :O
> Existing nodes in the cluster are a set of "N" nodes
> New nodes being added to cluster are a set of "M" nodes.
> N+M will be the total number of nodes in new volume configuration.
> Total files in the cluster before rebalance "X"
> Number of  files on each existing nodes are "J"  = (X / N)
> Number of files on each nodes after rebalance/scaling are "K"  = (X / (N+M))
> K * M = Z (Total Number of Files on set of M nodes after rebalance/scaling)
> J * N = X (Total files in the cluster before rebalance/scaling)
> Z / N = Y  (Total Number of Files moved from each existing nodes after 
> rebalance/scaling)
> ( Y / J ) * 100 = Percentage of Files moved from each 'N' nodes after 
> rebalance/scaling.
> ( J - Y ) / J * 100 = Percentage of Files existing on each 'N' nodes 
> after rebalance/scaling
> NOTE: "N" is obtained as not as just number of nodes but total 
> sub-volumes for "distribute" translator.  "M" is number of additional 
> sub-volumes added before starting rebalance and scaling.
> So for multiple exports from a single server we need to calculate the 
> total value moved from the server by multiplying with such number of 
> exports./
> Thanks,
> Craig
> -->
> Craig Carl
> Senior Systems Engineer
> Gluster
> On 12/06/2010 04:50 PM, Michael Robbert wrote:
>> How long should a rebalance take? I know that it depends so lets take this example. 4 servers, 1 brick per server. here is the df -i output from the servers:
>> [root at ra5 ~]# pdsh -g rack7 "df -i|grep brick"
>> iosrv-7-1:                      366288896 2720139 363568757    1% /mnt/brick1
>> iosrv-7-4:                      366288896 3240868 363048028    1% /mnt/brick4
>> iosrv-7-2:                      366288896 2594165 363694731    1% /mnt/brick2
>> iosrv-7-3:                      366288896 3267152 363021744    1% /mnt/brick3
>> So, it looks like there are roughly 10 million files. I have a rebalance running on one of the servers since last Friday and this is what the status looks like right now:
>> [root at iosrv-7-2 ~]# gluster volume rebalance gluster-test status
>> rebalance step 1: layout fix in progress: fixed layout 149531740
>> As a side note I started this rebalance when I noticed that about half of my clients are missing a certain set of files. Upon further investigation I found that a different set of clients are missing different data. This problem happened after many problems getting an upgrade to 3.1.1 working. Unfortunately I don't remember which version was running when I was last able to write to this volume.
>> Any thoughts?
>> Mike
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> <ATT00002..txt>

More information about the Gluster-users mailing list