[Gluster-users] New architecture: some advice needed

Mon Dec 22 12:56:00 UTC 2014

> Actually I have 3 supermicro servers with 12 4TB SATA disks each and 2
> SSD (in each server)
> Each server also has one dual port DDR Infiniband card.
> 
> I would like to create a scale-out storage infrastructure (primary
> used by web servers), totally HA and fault tollerant.
> I was thinking about 1 brick for each SATA disks in Distributed
> Dispersed mode. Replica set to 3 (so, actually, only 12*4TB=48TB would
> be available)
> 
> What do you suggest? Is Distributed Dispersed good for my environment
> or should I go with Distributed Replicated ?
> 
> In replicated mode, I can always access to raw files , in case of
> disaster, this would not be possible with dispersed mode, right?
> 
> Which are pro and cons between replicated and dispersed modes?
> 
> We plan to add up to 10 servers (all with 12*4 SATA disks) in the near
> future ending to 336TB of available and replicated space.
> 
> Any suggestions?

The key tradeoffs here are storage utilization vs. performance.  In
general, erasure codes (disperse) will give better storage utilization
than replication for the same level of performance.  However, this might
not be the case for N=3.  With replication, that will protect against
two failures.  However, from the admin guide section on disperse:

"redundancy_ must be greater than 0, and the total number of bricks must
be greater than 2 * _redundancy_"

I interpret this to mean that for two-failure protection you would need
at least five bricks.  With three bricks disperse can only offer
one-failure protection.  In this case it's roughly equivalent to RAID-5,
with only a 50% storage penalty vs. 100% for replica 2 offering the same
protection.

The other issue is performance.  With disperse, all writes *and reads*
must be done to all bricks, and at a stripe size equal to 512 times the
number of bricks (minus those used for redundancy).  This means more
data transfer, especially for reads, and also more write contention than
with replication.  This being new code, some optimizations that already
exist for replication do not yet exist for disperse even though they're
applicable.

Adding Xavier, who's the real expert on disperse, in case I got
something wrong here.