[Gluster-users] question re. replication strategies

Tue Jul 16 22:31:26 UTC 2013

Hi Folks,

I wonder if some of you might weight in on replication strategies w/ 
GlusterFS.

I'm about to assemble a small, high-availability storage cluster - 
mostly to support virtual machine images.  I'm doing it using 4 existing 
servers, each with 4 2TB drives (4 nodes x 4 disks each - total of 16 
drives).

Currently, I'm doing something relatively simpleminded:
- combine all 4 drives on each server, using md RAID10
- set up logical volumes with LVM
- mirror pairs of volumes, across servers, using DRBD
- provides protection against both dual-disk failures on each node, and 
single node failures

I really don't like the way DRBD limits one to pairwise replication, so 
I'm thinking of replacing it with GlusterFS.  (Lots of manual work to 
migrate stuff from one server to another.  Limited to 1 level of failover.)

My first thought is:
- on each node: pool the drives into a single volume, using ZFS
- replicate across nodes using Gluster - say replicating to 3 nodes w/ 
one spare

My second thought is this might be overkill, what about exposing all the 
drives, and using Gluster replication to make sure that each file (or 
block) is replicated to at least 3 drives, each on a different server.

Which leads me to two questions:
1.  Is it possible to configure Gluster replication to be aware how 
drives are connected to physical servers, and use that knowledge to 
control where things are replicated? (if not, stop here!)
2.  If yes, any comments/advice on which strategy makes sense (or if 
there's a better approach I'm not considering)?

Thanks very much,

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra