[Gluster-devel] Feature review: Improved rebalance performance

Xavier Hernandez xhernandez at datalab.es
Thu Jul 3 11:23:14 UTC 2014


On Thursday 03 July 2014 16:37:53 Raghavendra G wrote:
> > The idea of using index was more intended to easily detect renamed files
> > on an
> > otherwise balanced volume, and be able to perform quick rebalance
> > operations
> > to move them to the correct brick without having to crawl the entire file
> > system. On almost all cases, all files present in the index will need
> > rebalance, so the cost of crawling the index is worth it.
> 
> We did consider using index for identifying files that need migration. In
> the normal case it suits our needs. However, after an add-brick we cannot
> rely on index to avoid crawl, since layout itself would've been changed.

I agree. This feature should be disabled while doing a full rebalance due to a 
layout change (adding or removing a brick). However I think it's quite useful 
on normal operation (when the volume is supposed to be balanced).

Imagine a volume where normal operation consists on storing a large amount of 
files to an special "unprocessed" directory. Then this data is analyzed and 
classified (i.e moved) to a more meaningful directory for further processing. 
This workload generates a lot of link files and much of the data won't be in 
the brick it should be, even though that the volume was correctly balanced 
before starting the process. In this scenario having a periodic rebalance 
based on an index would be great and very efficient.

Xavi



More information about the Gluster-devel mailing list