[Gluster-users] GlusterFS and failure domains?

Tue Jul 8 15:46:08 UTC 2014

On 6 July 2014 19:17, Vijay Bellur <vbellur at redhat.com> wrote:

> On 07/01/2014 05:13 PM, Jonathan Barber wrote:
>
>> Hello all,
>>
>> I'm investigating GlusterFS+Swift for use in a "large" (starting at
>> ~150TB) scale out file system for storing and serving photographic images.
>>
>> Currently I'm thinking of using servers with JBODs and it's clear how to
>> use Gluster's replication sets to give resiliency at the server level.
>> However, I'd like to have multiple bricks per server (with a brick per
>> drive controller) and managing the replication sets starts to look more
>> complicated from a management point of view. Also,when it comes to
>> expanding the solution in the future, I reckon that I will be adding
>> bricks of different sizes with different numbers of bricks per server -
>> further complicating management.
>>
>> So, I was wondering if there is support for (or plans for) failure
>> domains (like Oracle's ASM failure groups) which would allow you to
>> describe groups of bricks within which replicas can't be co-located?
>> (e.g. bricks from the same server are placed in the same failure domain,
>> meaning that no-two replicas are allowed on these groups of bricks).
>>
>>
> A warning does already get displayed in CLI when bricks from the same
> server are attempted to be part of the same replica set:
>
> [root at deepthought lo]# gluster volume create myvol replica 2
> deepthought:/d/brick1 deepthought:/d/brick2
> Multiple bricks of a replicate volume are present on the same server. This
> setup is not optimal.
> Do you still want to continue creating the volume?  (y/n) n
> Volume create failed
>

Yes, I'd seen that warning. It isn't reported when adding bricks to an
existing volume though, e.g.:

# gluster volume add-brick myvol $HOSTNAME:/d/brick{1,2}
volume add-brick: success
#

(with Gluster 3.5.1 from the gluster-epel repo)

>
> What other policies would you be interested in associating with failure
> domains?
>

I was also thinking about failure domains that span hosts (perhaps because
some of the machines in a volume share a single point of failure such as a
top of rack switch, the same UPS, or are in the same room). It would also
then be possible to have a brick per drive and so not need RAID in the
server (if we have cross-server replication, I don't think additional
replication from RAID is necessary).

It'd also be nice to able to just say "I want 2 replicas, but I don't care
which bricks they are on as long as they aren't in the same failure
domain". This could let you have odd numbers of servers without having to
manually slice the storage and place the replicas.

Obviously, I have no idea about the internals of Gluster so I don't know
how complicated this is to achieve.

Cheers

Regards,
> Vijay
>
>
>
>

-- 
Jonathan Barber <jonathan.barber at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140708/c8bdd311/attachment.html>