[Gluster-users] Exorbitant cost to achieve redundancy??

Tue Feb 14 17:02:29 UTC 2012

The replica design of GlusterFS looks expensive but compared to some 
vendor solutions it is often still cheaper to have N+N servers and disks 
in a GlusterFS replica volume with a count of 2 than it is to go with 
the same usable amount of storage with other highly-available 
solutions.  It sure would be nice if GlusterFS supported N+1 or N+2 
servers/bricks instead of just N+N or N+N+N similar to RAID5/6 but I 
don't know if such a design is feasible.  Even if it was, I'm not 
confident the performance would be acceptable.  Higher performance with 
replica is something I really want but doesn't appear to be a major goal 
right now.

I'm currently fighting to get GlusterFS replica in an HPC environment 
but the "wasting half the space" argument is hard to fight when there's 
a tight budget.  There really is no waste at all, the space is being 
used for full server redundancy (IMHO you need server redundancy, not 
just disk redundancy) and in some use-cases, increased performance (in 
other use-cases replica is slower).

Does anyone think that an N+1 style server redundancy could ever be 
implemented and be reliable?

Jeff White - Linux/Unix Systems Engineer
University of Pittsburgh - CSSD

On 02/14/2012 04:35 AM, Arnold Krille wrote:
> Hi,
>
> On Monday 13 February 2012 16:15:16 Jeff Wiegley wrote:
>> In other words... GlusterFS TRIPLES all my storage costs to provide
>> 2 brick fault tolerance?
>> How do I get redundancy in GlusterFS while getting reasonable
>> storage costs where I am not wasting 50% of my investment or
>> more in providing copies to obtain redundancy?
> Show me any kind of redundancy without multiplying the efforts!
>
> Take a simple raid1 with two disks: How do you achieve fault-tolerance against
> one failing drive without storing the data on a second disk?
> When you need tolerance against two failing disks (at the same time), you have
> to have at least three disks containing the data.
>
> For bigger setups there are raid-levels that work with more then two disks and
> are tolerant against one or two failed drives, but then you "loose" one or two
> disks in your array for checksums. And these have a lot of disadvantages too.
>
> As cheap as disk-space got the last years (save the last 4 months since the
> flood), most admins just use raid1 and be done with it. (Yes, I am an advocate
> of baarf, though not an "official member":)
>
> Now the problem with raid inside one machine is that you still have the
> single-point-of-failure of motherboard, cpu, memory, psu(*), controller and
> network(to a point). With systems like glusterfs, moosefs, drbd and others you
> have your raid span multiple machines removing these spofs while preserving
> the advantage of local disk-reads. When you use the fs that way...
>
> And on a side-note: I don't know what you get per hour but taking low it-wages
> in germany it takes probably less then one man-week of data-recovery to
> amortize the "investment" of doubling disks and machine for redundancy.
> And when the data is lost due to missing redundancy, its not only one persons
> work that is lost... The additional hardware pays off faster then your
> boss/client can think about the expenses.
>
> Have fun,
>
> Arnold
>
> (*) Yes, you can have multiple psu in one machine. And thats nice too when you
> switch your machine from one ups to another. But power is still distributed by
> one power-distribution-board. Which is why I count the psu still as a spof.