[Gluster-users] Rebalancing the Distribute Hash Table Translator - Possible solutions

Fri Jun 19 04:44:58 UTC 2009

Hi,

I'd like to start a discussion on practical ways of rebalancing the
DHT Translator. I've thought of a few and wanted to get some feedback.

Obviously this is best done by Gluster itself but alas it does not
(yet?). If i'm being and idiot and there is an implementation already,
please feel free to tell me :P

The problem - Growing a DHT filesystem is as easy as adding additional
volumes. However, to make sure old files are still accessible, the
hash data is stored in the directory. This means any data added to old
directories will only go onto the volumes available at the creation
time of the directory. It makes it possible to have many empty disks
and still run out of disk space.

1.) Automated Recopy - A script of some sort traverses the filesystem
and recopies entire directories, deletes the original and moves the
new directory into one with the old name. This should rebalance the
files as the newly created directories now take into account all the
current volumes.
Pros: Online rebalance. Effective. Fairly easy to implement as it is
done through the Gluster interface so chance of bugs is small.
Cons: If the ratio of new volumes to old is small, most of the I/O is
effectively wasted copying data between full nodes. Choosing nodes to
copy is difficult.

2.) Background Moves - Behind the scenes, run a script to move files
from a full disk to the empty disk.
Pros: Easy to implement
Cons: Cannot be combined with AFR which any realistic cluster would be
using. Might break the DHT.

3.) AFR Friendly Background Moves - Behind the scenes, run a script to
simultaneously move files from 2+ full disks behind an AFR to the
empty 2+ disks behind an AFR
Pros: Effective even with AFR.
Cons: Hard to implement. Behind the scenes and can potentially cause
problems on a running cluster. Might break the DHT.

4.) Background move between AFR volumes - This is assuming you are
running a DHT translator in front of a series of AFR volumes. Mount
every AFR volume separately, move data from fuller AFRs to emptier
ones.
Pros: Easy to implement
Cons: Offline rebalance (probably). Best case scenario: Forces the DHT
to rehash the directory. New hash should take into account all
volumes.  Medium case scenario: DHT might "heal" the file by moving it
to the proper volume. Worse case scenario: Broken DHT.

Anyway, just some quick thoughts on how to do it. The common use case,
IMHO, is clustering with a few drives and slowly growing based on
usage.

Brandon