[Gluster-devel] scalability vs. namespace

Sat Dec 8 07:52:03 UTC 2007

I'd like to know the answer to this one also.
We plan to use glusterfs for an online backup service where there will 
be a lot of small files. I'd like to know some numbers of the 
theoretical limits on the number of files in a glusterfs cluster 
(depending on 32/64 bit os, used filesystem?), and if there are any 
other limiting things to consider to setup a practical/usable cluster 
with a lot of files.

Petr Kacer wrote:
> Hello,
>
>   first of all, THANK YOU for the work you do! :-)
>
>   GlusterFS looks very promising and I would like to try and use it in
> my cluster setup. However, what is not entirely clear to me (even after
> reading the docs and maillist archives) is how well (if at all) does it
> scale in terms of the total number of stored files? Just thinking in the
> long run... it might not be anything like an issue today but may as well
> be tomorrow.
>
>   I know Unify does a great job distributing the data across all bricks.
> So does Stripe, should the files be larger. But let's suppose I want to
> (theoretically) store a lot (e.g. a billion) of files or better yet,
> suppose the number of stored files grows with time and I have to keep
> them all. What will become the bottleneck?
>
>   Adding more bricks is definitely possible so the total _size_ of files
> is not an issue per se; glusterfs design seems to be able to cope with
> that just fine. Well, not the same with the total _number_ of files - if
> I got it right (please correct me if not) then each and every brick has
> to have enough space to host the entire directory tree structure and
> "worse" yet, the namespace brick has to be able to store as many inodes
> as there are the files in the whole cluster (including directory
> structure), making it the possible weak link of such a setup, from the
> total number of files point of view. 
>
>   Is there currently a way around this (other than getting a larger
> drive and/or using a different FS for the namespace brick, which merely
> gets you some time but does not really scale)? Or is the namespace meant
> to be "just a kind of cache" where some "garbage collection" might
> eventually be performed in the future?
>
>   Thanks,
>
>      Petr
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>