[Gluster-devel] a union to two stripes to fourteen mirrors...
Onyx
lists at bmail.be
Tue Nov 20 15:03:43 UTC 2007
Wow, very interesting concept!
Never thought of it...
Kind of like a raid5 over a network, right?
Just thinking out loud now, not sure if this is correct, but...
- In your setup, any single brick can fail like with raid5
- If you afr 2 times (3 copies), any 2 bricks can fail like with raid6
- If you afr n times, any n bricks can fail.
So you can setup a cluster with 50 bricks, afr 10 times, have a
redundancy of 10 bricks, and usable storage space of 40 bricks....
A complex but very interesting concept!
....
....AND... We could setup some detection system and other small
intelligence in the cluster to start a spare brick with the
configuration of the failed brick. BAM, hotspare brick alive, and
starting to auto-heal!
Man Glusterfs is flexible!
Can someone confirm if my thinking is not way-off here?
This makes me think of an other young cluster filesystem....
Jerker Nyberg wrote:
>
> Hi,
>
> I'm trying out different configurations of GlusterFS. I have 7 nodes
> each with two 320 GB disks where 300 GB om each disk is for the
> distributed file system.
>
> Each node is called N. Every file system is on the server side
> mirrored to the other disk on the next node, wrapped around so that
> the last node mirrors its disk to the first. invented. The real config
> is included in the end of this mail.
>
> Pseudodefinitions:
>
> fs(1) = a file system on the first disk
> fs(2) = a file system on the second disk
> n(I, fs(J)) = the fs J on node I
> afr(N .. M) = mirror the volumes
> stripe(N .. M) = stripe the volumes
>
> Server:
>
> Forw(N) = afr(fs(1), node(N+1, fs(2))
> Back(N) = afr(fs(2), node(N-1, fs(1))
>
> Client:
>
> FStr(N .. M) = stripe(n(N, Forw(N)) .. n(N+i, Forw(N+1)) .. n(M,
> Forw(M)))
> BStr(N .. M) = stripe(n(N, Back(N)) .. n(N+i, Back(N+1)) .. n(M,
> Back(M)))
> mount /glusterfs = union(FStr(1 .. 7), BStr(1..7))
>
>
>
> The goal was to get good performance but also redundancy. But this
> setup will not will it? The stripes will not work when a part of is
> gone and the union will not not magically find the other part of a
> file on the other stripe? And where to put the union namespace for
> good performance?
>
> But my major question is this: I tried to stripe a single stripe (not
> using union on the client, just striping on the servers which in turn
> mirrored) When rsync'ing in data on it on a single server things
> worked fine, but when I put some load on it from the other nodes
> (dd'ing in and out some large files) the glusterfsd's on the first
> server died... Do you want me to check this up more and try to
> reproduce and narrow down the problem, or is this kind of setup in
> general not a good idea?
>
> Regards
> Jerker Nyberg.
>
> ### client config
>
> # remote slices
> volume brick2
> type protocol/client
> option transport-type tcp/client
> option remote-host 10.0.0.2
> option remote-subvolume brick
> end-volume
> volume brick3
> type protocol/client
> option transport-type tcp/client
> option remote-host 10.0.0.3
> option remote-subvolume brick
> end-volume
> volume brick4
> type protocol/client
> option transport-type tcp/client
> option remote-host 10.0.0.4
> option remote-subvolume brick
> end-volume
> volume brick5
> type protocol/client
> option transport-type tcp/client # for TCP/IP transport
> option remote-host 10.0.0.5
> option remote-subvolume brick
> end-volume
> volume brick6
> type protocol/client
> option transport-type tcp/client
> option remote-host 10.0.0.6
> option remote-subvolume brick
> end-volume
> volume brick7
> type protocol/client
> option transport-type tcp/client
> option remote-host 10.0.0.7
> option remote-subvolume brick
> end-volume
> volume brick8
> type protocol/client
> option transport-type tcp/client
> option remote-host 10.0.0.8
> option remote-subvolume brick
> end-volume
> volume stripe
> type cluster/stripe
> subvolumes brick2 brick3 brick4 brick5 brick6 brick7 brick8
> option block-size *:32KB
> end-volume
> ### Add iothreads
> volume iothreads
> type performance/io-threads
> option thread-count 32 # deault is 1
> option cache-size 64MB #64MB
> subvolumes stripe
> end-volume
> ### Add readahead feature
> volume readahead
> type performance/read-ahead
> option page-size 256kB # unit in bytes
> # option page-count 20 # cache per file = (page-count x
> page-size)
> option page-count 10 # cache per file = (page-count x page-size)
> subvolumes iothreads
> end-volume
> ### Add IO-Cache feature
> volume iocache
> type performance/io-cache
> option page-size 256KB
> # option page-size 100MB
> option page-count 10
> subvolumes readahead
> end-volume
> ### Add writeback feature
> volume writeback
> type performance/write-behind
> option aggregate-size 1MB
> option flush-behind off
> subvolumes iocache
> end-volume
>
> ### server config for the 10.0.0.2
>
> # posix
> volume ba
> type storage/posix
> option directory /hda/glusterfs-a
> end-volume
> volume bc
> type storage/posix
> option directory /hdc/glusterfs-c
> end-volume
> # remote mirror
> volume mc
> type protocol/client
> option transport-type tcp/client
> option remote-host 10.0.0.3 # the next node
> option remote-subvolume bc
> end-volume
> # join
> volume afr
> type cluster/afr
> subvolumes ba mc
> end-volume
> # lock
> volume pl
> type features/posix-locks
> subvolumes afr
> end-volume
> # threads
> volume brick
> type performance/io-threads
> option thread-count 16 # deault is 1
> option cache-size 128MB #64MB
> subvolumes pl
> end-volume
> # export
> volume server
> type protocol/server
> option transport-type tcp/server
> subvolumes brick, bc
> option auth.ip.brick.allow *
> option auth.ip.bc.allow *
> end-volume
>
>
>
> # glusterfs --version
> glusterfs 1.3.8 built on Nov 16 2007
> Copyright (c) 2006, 2007 Z RESEARCH Inc. <http://www.zresearch.com>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU
> General Public License.
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list