[Gluster-devel] scalability vs. namespace

Fri Dec 14 10:40:01 UTC 2007

Sorry for the delay folks, let me try to answer this one.

GlusterFS project started with an vision of having no limits in any field of
filesystem within itself, but without doubt it depends on the scalability of
backend filesystem.

Now, currently with 1.3.x version of GlusterFS, we know that using unify,
the hard limits of number of files that can be created is the possible
number of files that can be created in the namespace. Hence an advice is
while choosing the namespace choose the one which supports maximum number of
inodes for the given size of namespace.

But, as we understand its a serious limitation for us to scale to billions
of files, with 1.4.x version of GlusterFS (no dates of release yet), we want
to have a distributed namespace design, through which, the current hard
limit doesn't matter. Also, as of now namespace is a cache like (ie,
GlusterFS rebuilts namespace if some entry is missing through its
self-heal), option. Hence, we believe the switching between current
1.3.xversion and
1.4.x version should be seemless.

Hope this answers the question.

Regards,
Amar

On Dec 8, 2007 1:22 PM, Onyx <lists at bmail.be> wrote:

> I'd like to know the answer to this one also.
> We plan to use glusterfs for an online backup service where there will
> be a lot of small files. I'd like to know some numbers of the
> theoretical limits on the number of files in a glusterfs cluster
> (depending on 32/64 bit os, used filesystem?), and if there are any
> other limiting things to consider to setup a practical/usable cluster
> with a lot of files.
>
>
>
> Petr Kacer wrote:
> > Hello,
> >
> >   first of all, THANK YOU for the work you do! :-)
> >
> >   GlusterFS looks very promising and I would like to try and use it in
> > my cluster setup. However, what is not entirely clear to me (even after
> > reading the docs and maillist archives) is how well (if at all) does it
> > scale in terms of the total number of stored files? Just thinking in the
> > long run... it might not be anything like an issue today but may as well
> > be tomorrow.
> >
> >   I know Unify does a great job distributing the data across all bricks.
> > So does Stripe, should the files be larger. But let's suppose I want to
> > (theoretically) store a lot (e.g. a billion) of files or better yet,
> > suppose the number of stored files grows with time and I have to keep
> > them all. What will become the bottleneck?
> >
> >   Adding more bricks is definitely possible so the total _size_ of files
> > is not an issue per se; glusterfs design seems to be able to cope with
> > that just fine. Well, not the same with the total _number_ of files - if
> > I got it right (please correct me if not) then each and every brick has
> > to have enough space to host the entire directory tree structure and
> > "worse" yet, the namespace brick has to be able to store as many inodes
> > as there are the files in the whole cluster (including directory
> > structure), making it the possible weak link of such a setup, from the
> > total number of files point of view.
> >
> >   Is there currently a way around this (other than getting a larger
> > drive and/or using a different FS for the namespace brick, which merely
> > gets you some time but does not really scale)? Or is the namespace meant
> > to be "just a kind of cache" where some "garbage collection" might
> > eventually be performed in the future?
> >
> >   Thanks,
> >
> >      Petr
> >
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!