Josh Harrison hijakk at gmail.com
Wed Feb 24 00:44:41 UTC 2016

I used glusterFS 3-4 years ago and I'm looking at it as a potential
solution for creating an HTTP accessible internal cache of files for a
project I'm working on. This doesn't need to be super scalable in terms of
the number of simultaneous users, but relatively consistent access in the
sub second range per file would be important.

I currently have images stored directly on disk, with an Apache service
sitting on top to provide the files back on demand.
I have a naming structure set up such that the first two characters in a
file name provide a folder path - file "abc123.txt" sits at
This is how I avoid having millions of files in a single directory. This
can be extended as needed, of course, and is managed with a simple apache
rewrite rule.

If I need to cache and expose millions to billions of relatively small
files (average file size 50kb), where am I likely to encounter problems
with glusterFS?
Are there block size issues?
inode issues?
Obviously raw disk storage is a constraint, but are there any others I
should be aware of when planning this service?

Can I point Apache at an NFS filesystem mounted glusterFS volume and do the
same kind of service I'm doing currently?
Is there a better way to do this?

Do I need to do the same kind of file routing I'm doing currently, within
That is to say, will I still need to store data in my gluster volume at

Josh Harrison
