[Gluster-users] Need help to design a data storage

Thu Sep 1 11:17:31 UTC 2016

Hi,

On 09/08/16 20:43, Gandalf Corvotempesta wrote:
> Il 09 ago 2016 19:57, "Ashish Pandey" <aspandey at redhat.com
> <mailto:aspandey at redhat.com>> ha scritto:
>> Yes, redundant data spread across multiple servers. In my example I
> mentioned 6 different nodes each with one brick.
>> Point is that for 4+2 you can loose any 2 bricks. It could be because
> of node failure or brick failure.
>> 1 - 6 bricks on 6 different nodes - any 2 nodes may go down - EC win
>>
>> However if you have only 2 nodes and 3 bricks on each nodes, then yes
> in this case even if one node goes down, ec will fail because that will
> cause 3 bricks down.
>> In this case replica 3 would win.
>
> 6 nodes with 1 brick each is a surreal case.
> A much common case is multiple nodes with multiple bricks, something
> like 9 nodes with 12 bricks each. (In example,  a 2U supermicro server
> with 12 disks)
>
> In this case, EC replicas could be placed on a single server.

Not really. The disperse sets, like the replica sets, are defined when 
the volume is created. You must make sure that every disperse set is 
made of bricks from different servers. If this condition is satisfied 
while creating the volume, there won't be two fragments of the same file 
on two bricks of the same server.

>
> And with 9*12 bricks you still have 2 single disks (or one server if
> both are placed on the same hardware) as failure domains.
> Yes, you'll get 9*(12-2) usable bricks and not (9*12)/3 but you risk
> data loss for sure.

It's true that the probability of failure of a distributed-replicated 
volume is smaller than a distributed-dispersed one. However if you are 
considering big volumes of redundancy 2 or higher, replica gets 
prohibitively expensive and wastes a lot of bandwidth.

You can reduce local disk failure probability by creating bricks over a 
RAID5 or RAID6 if you want. It will waste more disks, but many less than 
a replica.

>
> Just a question:  with EC which is the right calc method between these 3:
>
> a)  (#servers*#bricks)-#replicas
>
> Or
>
> b) #servers*(#bricks - #replicas)
>
> Or
>
> c) (#servers-#replicas)*#bricks
>
> In case A I'll use 2 disks as replica for the whole volume (exactly like
> a raid6)
>
> In case B I'll use 2 disks from each server as replica
>
> in case C I'll use 2 whole servers as replica (this is the most secure
> as i can loose 2 whole servers)

In fact none of these is completely correct. The redundancy level is per 
disperse set, not for the whole volume.

S: number of servers
D: number of disks per server
N: Disperse set size
R: Disperse redundancy

Usable disks = S * D * (1 - R / N)

>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>