[Gluster-devel] Feature review: Improved rebalance performance

Xavier Hernandez xhernandez at datalab.es
Tue Jul 1 10:27:06 UTC 2014


On Tuesday 01 July 2014 05:55:51 Raghavendra Gowdappa wrote:
> ----- Original Message -----
> > > > Another thing to consider for future versions is to modify the current
> > > > DHT
> > > > to a consistent hashing and even the hash value (using gfid instead of
> > > > a
> > > > hash of the name would solve the rename problem). The consistent
> > > > hashing
> > > > would drastically reduce the number of files that need to be moved and
> > > > already solves some of the current problems. This change needs a lot
> > > > of
> > > > thinking though.
> > > 
> > > The problem with using gfid for hashing instead of name is that we run
> > > into
> > > a chicken and egg problem. Before lookup, we cannot know the gfid of the
> > > file and to lookup the file, we need gfid to find out the node in which
> > > file resides. Of course, this problem would go away if we lookup (may be
> > > just during fresh lookups) on all the nodes, but that slows down the
> > > fresh
> > > lookups and may not be acceptable.
> > 
> > I think it's not so problematic, and the benefits would be considerable.
> > 
> > The gfid of the root directory is always known. This means that we could
> > always do a lookup on root by gfid.
> > 
> > I haven't tested it but as I understand it, when you want to do a getxattr
> > on a file inside a subdirectory, for example, the kernel will issue
> > lookups on all intermediate directories to check,
> 
> Yes, but how does dht handle these lookups? Are you suggesting that we wind
> the lookup call to all subvolumes (since we don't know which subvolume the
> file is present for lack of gfid)?

Oops, that's true. It only works combined with another idea we had about 
storing directories as special files (using the same redundancy as normal 
files). This way a lookup for an entry would be translated to a special lookup 
for the parent directory (we know where it is and its gfid) asking for an 
specific entry that will return its gfid (and probably some other info). Of 
course this has more implications like that the bricks won't be able to 
maintain a (partial) view of the file system like now.

Right now, using gfid as the hash key is not possible because this would need 
asking to each subvolume on lookups as you say, and this is not efficient.

The solution I commented would need some important architectural changes. It 
could be an option to consider for 4.0.

Xavi



More information about the Gluster-devel mailing list