[Gluster-users] Distributed-replicated vs striped-replicated.. some basic questions

Tue Aug 26 14:46:21 UTC 2014

We have a situation where 

- write performance is important
- median file size is more than 1GB

1. Is striping is the way to go for this data... to get better write-speed?

Currently I have a 2x2 distributed replicated volume, which, for a single file, has 2x read performance, but around 1x performance for writes.

2. In this scenario, would a "distributed 2x replicated 4x striped" volume would be the fastest performance that still allows a single-node failure, and the ability to easily add new nodes?   What would striping without distributing do?

3. Is this still "beta" code?   Should I avoid it until the feature is stable?   

4. Is erasure coding ever going to be on the table for a production release (eliminating the need for replicas in striped storage)?

5. Is there such thing as a gluster "meta-volume" ... which combines multiple gluster volumes into a single one... and allows the automatic moving of directories and/or files into faster/slower volumes based on frequency and types of use?   

I can imagine a nice, easily programmable ruleset:

Directories with lots of small files... marked as  distributed/replicated.  Giant files that get read very frequently?  Striped for sure.  If there's a known file-name or regex that will always be a huge write?  Mark it as striped.   Files that are hardly ever touched?   Move them to the slow storage... etc.   I imagine something like this has already been done by someone.