[Gluster-devel] scalability vs. namespace

Petr Kacer don at don.cz
Thu Dec 6 14:37:39 UTC 2007


Hello,

  first of all, THANK YOU for the work you do! :-)

  GlusterFS looks very promising and I would like to try and use it in
my cluster setup. However, what is not entirely clear to me (even after
reading the docs and maillist archives) is how well (if at all) does it
scale in terms of the total number of stored files? Just thinking in the
long run... it might not be anything like an issue today but may as well
be tomorrow.

  I know Unify does a great job distributing the data across all bricks.
So does Stripe, should the files be larger. But let's suppose I want to
(theoretically) store a lot (e.g. a billion) of files or better yet,
suppose the number of stored files grows with time and I have to keep
them all. What will become the bottleneck?

  Adding more bricks is definitely possible so the total _size_ of files
is not an issue per se; glusterfs design seems to be able to cope with
that just fine. Well, not the same with the total _number_ of files - if
I got it right (please correct me if not) then each and every brick has
to have enough space to host the entire directory tree structure and
"worse" yet, the namespace brick has to be able to store as many inodes
as there are the files in the whole cluster (including directory
structure), making it the possible weak link of such a setup, from the
total number of files point of view. 

  Is there currently a way around this (other than getting a larger
drive and/or using a different FS for the namespace brick, which merely
gets you some time but does not really scale)? Or is the namespace meant
to be "just a kind of cache" where some "garbage collection" might
eventually be performed in the future?

  Thanks,

     Petr








More information about the Gluster-devel mailing list