[Gluster-devel] Re: glusterFS, many small files and data replication

Wed Mar 7 20:03:25 UTC 2007

We have a similar setup with millions of small XML fragments (5-15k bytes) that is expected to grow much larger. 
Are we to continue to expect O(1) performance as our dataset increases?  The documentation emphasizes how well the 
system scales for storage capacity, but not wrt inodes (for lack of a better term).

Best,

Erik Osterman

Bernhard J. M. Gruen Wrote:

> Hello list members,
> at the moment we are searching for a storage cluster solution that
> should fulfill the following specifications:
> * 3 cluster nodes, each node should have 24x SATA disks (750GB) in a RAID 6
> * data replication (each file stored on at least 2 of 3 nodes)
> * easy desaster recovery (in case a node crashes fully)
> * high speed read access to many small files at the same time
> (20.000.000-200.000.000 files of sizes 5kB to 60kB)
> * files are delivered by a web server
> * average writing speed
> * system should even work if one(two) storage node is completely down
> At the moment we think most of this is already possible with
> glusterFS. Only the recovery part is not yet possible (should be done
> with 1.4).
> But how does glusterFS perform with that many small files? I could
> imagine that glusterFS is not optimized for that usage because it does
> not have a meta data server that helps the clients to find the right
> server to ask for a file.
> Is glusterFS the right system for us?