[Gluster-devel] a union to two stripes to fourteen mirrors...
Kevan Benson
kbenson at a-1networks.com
Tue Nov 20 18:21:02 UTC 2007
I think it's more of a RAID 10 or 0+1. Since AFR is being used, all
data is being stored twice (at a minimum, for all redundant files).
Only AFRing 10 of 50 bricks would mean that 25% (10 of 40) bricks were
redundant, and the data they contained would be available on another
brick, whether it was a stripe of a file or a full file. The other 75%
of the bricks would be single points of failure.
RAID 5 actually uses parity, and GlusterFS doesn't have any method for
doing parity currently that I know of (although that would be a nice
translator, or nice addition to the stripe translator).
Onyx wrote:
> Wow, very interesting concept!
> Never thought of it...
> Kind of like a raid5 over a network, right?
>
> Just thinking out loud now, not sure if this is correct, but...
> - In your setup, any single brick can fail like with raid5
> - If you afr 2 times (3 copies), any 2 bricks can fail like with raid6
> - If you afr n times, any n bricks can fail.
>
> So you can setup a cluster with 50 bricks, afr 10 times, have a
> redundancy of 10 bricks, and usable storage space of 40 bricks....
> A complex but very interesting concept!
> ....
> ....AND... We could setup some detection system and other small
> intelligence in the cluster to start a spare brick with the
> configuration of the failed brick. BAM, hotspare brick alive, and
> starting to auto-heal!
>
> Man Glusterfs is flexible!
>
> Can someone confirm if my thinking is not way-off here?
>
>
> This makes me think of an other young cluster filesystem....
>
>
> Jerker Nyberg wrote:
>>
>> Hi,
>>
>> I'm trying out different configurations of GlusterFS. I have 7 nodes
>> each with two 320 GB disks where 300 GB om each disk is for the
>> distributed file system.
>>
>> Each node is called N. Every file system is on the server side
>> mirrored to the other disk on the next node, wrapped around so that
>> the last node mirrors its disk to the first. invented. The real config
>> is included in the end of this mail.
>>
>> Pseudodefinitions:
>>
>> fs(1) = a file system on the first disk
>> fs(2) = a file system on the second disk
>> n(I, fs(J)) = the fs J on node I
>> afr(N .. M) = mirror the volumes
>> stripe(N .. M) = stripe the volumes
>>
>> Server:
>>
>> Forw(N) = afr(fs(1), node(N+1, fs(2))
>> Back(N) = afr(fs(2), node(N-1, fs(1))
>>
>> Client:
>>
>> FStr(N .. M) = stripe(n(N, Forw(N)) .. n(N+i, Forw(N+1)) .. n(M,
>> Forw(M)))
>> BStr(N .. M) = stripe(n(N, Back(N)) .. n(N+i, Back(N+1)) .. n(M,
>> Back(M)))
>> mount /glusterfs = union(FStr(1 .. 7), BStr(1..7))
>>
>>
>>
>> The goal was to get good performance but also redundancy. But this
>> setup will not will it? The stripes will not work when a part of is
>> gone and the union will not not magically find the other part of a
>> file on the other stripe? And where to put the union namespace for
>> good performance?
>>
>> But my major question is this: I tried to stripe a single stripe (not
>> using union on the client, just striping on the servers which in turn
>> mirrored) When rsync'ing in data on it on a single server things
>> worked fine, but when I put some load on it from the other nodes
>> (dd'ing in and out some large files) the glusterfsd's on the first
>> server died... Do you want me to check this up more and try to
>> reproduce and narrow down the problem, or is this kind of setup in
>> general not a good idea?
>>
>> Regards
>> Jerker Nyberg.
>>
>> ### client config
>>
>> # remote slices
>> volume brick2
>> type protocol/client
>> option transport-type tcp/client
>> option remote-host 10.0.0.2
>> option remote-subvolume brick
>> end-volume
>> volume brick3
>> type protocol/client
>> option transport-type tcp/client
>> option remote-host 10.0.0.3
>> option remote-subvolume brick
>> end-volume
>> volume brick4
>> type protocol/client
>> option transport-type tcp/client
>> option remote-host 10.0.0.4
>> option remote-subvolume brick
>> end-volume
>> volume brick5
>> type protocol/client
>> option transport-type tcp/client # for TCP/IP transport
>> option remote-host 10.0.0.5
>> option remote-subvolume brick
>> end-volume
>> volume brick6
>> type protocol/client
>> option transport-type tcp/client
>> option remote-host 10.0.0.6
>> option remote-subvolume brick
>> end-volume
>> volume brick7
>> type protocol/client
>> option transport-type tcp/client
>> option remote-host 10.0.0.7
>> option remote-subvolume brick
>> end-volume
>> volume brick8
>> type protocol/client
>> option transport-type tcp/client
>> option remote-host 10.0.0.8
>> option remote-subvolume brick
>> end-volume
>> volume stripe
>> type cluster/stripe
>> subvolumes brick2 brick3 brick4 brick5 brick6 brick7 brick8
>> option block-size *:32KB
>> end-volume
>> ### Add iothreads
>> volume iothreads
>> type performance/io-threads
>> option thread-count 32 # deault is 1
>> option cache-size 64MB #64MB
>> subvolumes stripe
>> end-volume
>> ### Add readahead feature
>> volume readahead
>> type performance/read-ahead
>> option page-size 256kB # unit in bytes
>> # option page-count 20 # cache per file = (page-count x
>> page-size)
>> option page-count 10 # cache per file = (page-count x page-size)
>> subvolumes iothreads
>> end-volume
>> ### Add IO-Cache feature
>> volume iocache
>> type performance/io-cache
>> option page-size 256KB
>> # option page-size 100MB
>> option page-count 10
>> subvolumes readahead
>> end-volume
>> ### Add writeback feature
>> volume writeback
>> type performance/write-behind
>> option aggregate-size 1MB
>> option flush-behind off
>> subvolumes iocache
>> end-volume
>>
>> ### server config for the 10.0.0.2
>>
>> # posix
>> volume ba
>> type storage/posix
>> option directory /hda/glusterfs-a
>> end-volume
>> volume bc
>> type storage/posix
>> option directory /hdc/glusterfs-c
>> end-volume
>> # remote mirror
>> volume mc
>> type protocol/client
>> option transport-type tcp/client
>> option remote-host 10.0.0.3 # the next node
>> option remote-subvolume bc
>> end-volume
>> # join
>> volume afr
>> type cluster/afr
>> subvolumes ba mc
>> end-volume
>> # lock
>> volume pl
>> type features/posix-locks
>> subvolumes afr
>> end-volume
>> # threads
>> volume brick
>> type performance/io-threads
>> option thread-count 16 # deault is 1
>> option cache-size 128MB #64MB
>> subvolumes pl
>> end-volume
>> # export
>> volume server
>> type protocol/server
>> option transport-type tcp/server
>> subvolumes brick, bc
>> option auth.ip.brick.allow *
>> option auth.ip.bc.allow *
>> end-volume
>>
>>
>>
>> # glusterfs --version
>> glusterfs 1.3.8 built on Nov 16 2007
>> Copyright (c) 2006, 2007 Z RESEARCH Inc. <http://www.zresearch.com>
>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>> You may redistribute copies of GlusterFS under the terms of the GNU
>> General Public License.
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> .
>
--
-Kevan Benson
-A-1 Networks
More information about the Gluster-devel
mailing list