[Gluster-devel] Performance scaling questions

Mon Dec 10 23:18:56 UTC 2007

Hi Martin,
 apologies for the rather late reply. I've tried to answer your questions
inline.

>
> We have an application that has 10 billion small (256-4096 byte)
> files. We do 10 reads for every write. We read and write each file in
> its entirety. Only 1% of the files are hot; i.e. being read and
> written in the same hour.
>
> Some questions:
>
> * Can I have 500 clients all mounting the file system simultaneously?

should not be a problem.

* Will my reads be primarily out of memory? Or am I going to be
> limited by spindles?

If your files are re-read, you can gain some serious performance with
io-cache. else they would be served off the server VFS block cache.

How many bricks/ram will I need so that I'm
> mostly reading from memory?

Depends really on the work load, how many files are in common set etc. You
can start trying from 1 :)

Is the cache write through or will a
> write require a disk access on the next read?

Depends on the translators you use. Currently write-behind is the write
cache translator, which cut short's the write system call but flushes
through the data at the same time.

avati