[Gluster-users] Distributed-replicated vs striped-replicated.. some basic questions
earonesty at expressionanalysis.com
Tue Aug 26 14:46:21 UTC 2014
We have a situation where
- write performance is important
- median file size is more than 1GB
1. Is striping is the way to go for this data... to get better write-speed?
Currently I have a 2x2 distributed replicated volume, which, for a single file, has 2x read performance, but around 1x performance for writes.
2. In this scenario, would a "distributed 2x replicated 4x striped" volume would be the fastest performance that still allows a single-node failure, and the ability to easily add new nodes? What would striping without distributing do?
3. Is this still "beta" code? Should I avoid it until the feature is stable?
4. Is erasure coding ever going to be on the table for a production release (eliminating the need for replicas in striped storage)?
5. Is there such thing as a gluster "meta-volume" ... which combines multiple gluster volumes into a single one... and allows the automatic moving of directories and/or files into faster/slower volumes based on frequency and types of use?
I can imagine a nice, easily programmable ruleset:
Directories with lots of small files... marked as distributed/replicated. Giant files that get read very frequently? Striped for sure. If there's a known file-name or regex that will always be a huge write? Mark it as striped. Files that are hardly ever touched? Move them to the slow storage... etc. I imagine something like this has already been done by someone.
More information about the Gluster-users