[Gluster-devel] Feature review: Improved rebalance performance
Xavier Hernandez
xhernandez at datalab.es
Thu Jul 3 11:23:14 UTC 2014
On Thursday 03 July 2014 16:37:53 Raghavendra G wrote:
> > The idea of using index was more intended to easily detect renamed files
> > on an
> > otherwise balanced volume, and be able to perform quick rebalance
> > operations
> > to move them to the correct brick without having to crawl the entire file
> > system. On almost all cases, all files present in the index will need
> > rebalance, so the cost of crawling the index is worth it.
>
> We did consider using index for identifying files that need migration. In
> the normal case it suits our needs. However, after an add-brick we cannot
> rely on index to avoid crawl, since layout itself would've been changed.
I agree. This feature should be disabled while doing a full rebalance due to a
layout change (adding or removing a brick). However I think it's quite useful
on normal operation (when the volume is supposed to be balanced).
Imagine a volume where normal operation consists on storing a large amount of
files to an special "unprocessed" directory. Then this data is analyzed and
classified (i.e moved) to a more meaningful directory for further processing.
This workload generates a lot of link files and much of the data won't be in
the brick it should be, even though that the volume was correctly balanced
before starting the process. In this scenario having a periodic rebalance
based on an index would be great and very efficient.
Xavi
More information about the Gluster-devel
mailing list