[Gluster-users] Production cluster planning

Joe Julian joe at julianfamily.org
Wed Oct 26 21:38:49 UTC 2016


On 10/26/2016 02:12 PM, Gandalf Corvotempesta wrote:
> 2016-10-26 23:07 GMT+02:00 Joe Julian <joe at julianfamily.org>:
>> And yes, they can fail, but 20TB is small enough to heal pretty quickly.
> 20TB small enough to build quickly? On which network? Gluster doesn't
> have a dedicated cluster network, if the cluster is being hevily
> accessed, the healing will slow down everything else (or everything
> else will slow down the healing)

Quickly = MTTR is within tolerances to continue to meet SLA. It's just math.

As for a dedicated heal network, split-horizon dns handles that just 
fine. Clients resolve a server's hostname to the "eth1" (for example) 
address and the servers themselves resolve the same hostname to the 
"eth0" address. We played with bonding but decided against the complexity.

>
> Anyway, you can heal quickly, but I still prefere to have data safe on
> each node. If you start with 3 server at once, probably each disk is
> coming from the same batch, thus a massive disks failure is easy to
> get.

There's preference and there's engineering to meet requirements. If your 
SLA is 5 nines and you engineer 6 nines, you may realize that the 
difference between a 99.99993% uptime and a 99.99997% uptime isn't worth 
the added expense of doing replication /and/ raid-1.

> If you loose only 2 disks, one for each server, from the same replica
> group, you are game over. With RAID6, you have to loose 5 disks from
> the same replica group.

I never loose my drives. They're always firmly attached. :P

With 300 drives, 60 bricks, replica 3 (across 3 racks), I have a six 
nines availability for any one replica subvolume. If you really want to 
fudge the numbers, the reliability for any given file is not worth 
calculating in that volume. The odds of all three bricks failing for any 
1 file among 20 distribute subvolumes is statistically infinitesimal.

>
> In my environment, I can create 4 RAID-0 on each server (3 disks on
> each RAID0), or 2 RAID-6 with 6 disks each, or 1 RAID-6 with 12 disks
> or 1 RAID-7 with 12 disks (RAID-7 with less than 12 disks is
> non-sense)
> I don't know which one is better.

Just do the reliability calculations and engineer a storage system to 
meet (exceed) your obligations within the available budget. 
http://www.eventhelix.com/realtimemantra/faulthandling/system_reliability_availability.htm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161026/e9f91fc2/attachment.html>


More information about the Gluster-users mailing list