[Gluster-devel] Data classification proposal

Xavier Hernandez xhernandez at datalab.es
Wed Jun 25 14:19:55 UTC 2014


On Wednesday 25 June 2014 08:35:05 Jeff Darcy wrote:
> > For the short-term, wouldn't it be OK to disallow adding bricks that
> > is not a multiple of group-size?
> 
> In the *very* short term, yes.  However, I think that will quickly
> become an issue for users who try to deploy erasure coding because those
> group sizes will be quite large.  As soon as we implement tiering, our
> very next task - perhaps even before tiering gets into a release -
> should be to implement automatic brick splitting.  That will bring other
> benefits as well, such as variable replication levels to handle the
> sanlock case, or overlapping replica sets to spread a failed brick's
> load over more peers.

If I understand correctly the proposed data-classification architecture, each 
server will have a number of bricks that will be dynamically modified as 
needed: as more data-classifying conditions are defined, a new layer of 
translators will be added (a new DHT or AFR, or something else) and some or 
all existing bricks will be split to accommodate the new and, maybe, 
overlapping condition.

How space will be allocated to each new sub-brick ? some sort of thin-
provisioning or will it be distributed evenly on each split ?

If using thin-provisioning, it will be hard to determine real available space. 
If using a fixed amount, we can get to scenarios where a file cannot be 
written even if there seems to be enough free space. This can already happen 
today if using very big files on almost full bricks. I think brick splitting 
can accentuate this.

Also, the addition of multiple layered DHT translators, as it's implemented 
today, could add a lot more of latency, specially on directory listings.

Another problem I see is that splitting bricks will require a rebalance, which 
is a costly operation. It doesn't seem right to require a so expensive 
operation every time you add a new condition on an already created volume.

Maybe I've missed something important ?

Thanks,

Xavi


More information about the Gluster-devel mailing list