[Gluster-devel] scalability vs. namespace
Onyx
lists at bmail.be
Sat Dec 8 07:52:03 UTC 2007
I'd like to know the answer to this one also.
We plan to use glusterfs for an online backup service where there will
be a lot of small files. I'd like to know some numbers of the
theoretical limits on the number of files in a glusterfs cluster
(depending on 32/64 bit os, used filesystem?), and if there are any
other limiting things to consider to setup a practical/usable cluster
with a lot of files.
Petr Kacer wrote:
> Hello,
>
> first of all, THANK YOU for the work you do! :-)
>
> GlusterFS looks very promising and I would like to try and use it in
> my cluster setup. However, what is not entirely clear to me (even after
> reading the docs and maillist archives) is how well (if at all) does it
> scale in terms of the total number of stored files? Just thinking in the
> long run... it might not be anything like an issue today but may as well
> be tomorrow.
>
> I know Unify does a great job distributing the data across all bricks.
> So does Stripe, should the files be larger. But let's suppose I want to
> (theoretically) store a lot (e.g. a billion) of files or better yet,
> suppose the number of stored files grows with time and I have to keep
> them all. What will become the bottleneck?
>
> Adding more bricks is definitely possible so the total _size_ of files
> is not an issue per se; glusterfs design seems to be able to cope with
> that just fine. Well, not the same with the total _number_ of files - if
> I got it right (please correct me if not) then each and every brick has
> to have enough space to host the entire directory tree structure and
> "worse" yet, the namespace brick has to be able to store as many inodes
> as there are the files in the whole cluster (including directory
> structure), making it the possible weak link of such a setup, from the
> total number of files point of view.
>
> Is there currently a way around this (other than getting a larger
> drive and/or using a different FS for the namespace brick, which merely
> gets you some time but does not really scale)? Or is the namespace meant
> to be "just a kind of cache" where some "garbage collection" might
> eventually be performed in the future?
>
> Thanks,
>
> Petr
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list