[Gluster-devel] a union to two stripes to fourteen mirrors...

Onyx lists at bmail.be
Tue Nov 20 15:03:43 UTC 2007


Wow, very interesting concept!
Never thought of it...
Kind of like a raid5 over a network, right?

Just thinking out loud now, not sure if this is correct, but...
- In your setup, any single brick can fail like with raid5
- If you afr 2 times (3 copies), any 2 bricks can fail like with raid6
- If you afr n times, any n bricks can fail.

So you can setup a cluster with 50 bricks, afr 10 times, have a 
redundancy of 10 bricks, and usable storage space of 40 bricks....
A complex but very interesting concept!
....
....AND... We could setup some detection system and other small 
intelligence in the cluster to start a spare brick with the 
configuration of the failed brick. BAM, hotspare brick alive, and 
starting to auto-heal!

Man Glusterfs is flexible!

Can someone confirm if my thinking is not way-off here?


This makes me think of an other young cluster filesystem....


Jerker Nyberg wrote:
>
> Hi,
>
> I'm trying out different configurations of GlusterFS. I have 7 nodes 
> each with two 320 GB disks where 300 GB om each disk is for the 
> distributed file system.
>
> Each node is called N. Every file system is on the server side 
> mirrored to the other disk on the next node, wrapped around so that 
> the last node mirrors its disk to the first. invented. The real config 
> is included in the end of this mail.
>
> Pseudodefinitions:
>
> fs(1) = a file system on the first disk
> fs(2) = a file system on the second disk
> n(I, fs(J)) = the fs J on node I
> afr(N .. M) = mirror the volumes
> stripe(N .. M) = stripe the volumes
>
> Server:
>
> Forw(N) = afr(fs(1), node(N+1, fs(2))
> Back(N) = afr(fs(2), node(N-1, fs(1))
>
> Client:
>
> FStr(N .. M) = stripe(n(N, Forw(N)) .. n(N+i, Forw(N+1)) .. n(M, 
> Forw(M)))
> BStr(N .. M) = stripe(n(N, Back(N)) .. n(N+i, Back(N+1)) .. n(M, 
> Back(M)))
> mount /glusterfs = union(FStr(1 .. 7), BStr(1..7))
>
>
>
> The goal was to get good performance but also redundancy. But this 
> setup will not will it? The stripes will not work when a part of is 
> gone and the union will not not magically find the other part of a 
> file on the other stripe? And where to put the union namespace for 
> good performance?
>
> But my major question is this: I tried to stripe a single stripe (not 
> using union on the client, just striping on the servers which in turn 
> mirrored) When rsync'ing in data on it on a single server things 
> worked fine, but when I put some load on it from the other nodes 
> (dd'ing in and out some large files) the glusterfsd's on the first 
> server died... Do you want me to check this up more and try to 
> reproduce and narrow down the problem, or is this kind of setup in 
> general not a good idea?
>
> Regards
> Jerker Nyberg.
>
> ### client config
>
> # remote slices
> volume brick2
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.2
>   option remote-subvolume brick
> end-volume
> volume brick3
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.3
>   option remote-subvolume brick
> end-volume
> volume brick4
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.4
>   option remote-subvolume brick
> end-volume
> volume brick5
>   type protocol/client
>   option transport-type tcp/client     # for TCP/IP transport
>   option remote-host 10.0.0.5
>   option remote-subvolume brick
> end-volume
> volume brick6
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.6
>   option remote-subvolume brick
> end-volume
> volume brick7
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.7
>   option remote-subvolume brick
> end-volume
> volume brick8
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.8
>   option remote-subvolume brick
> end-volume
> volume stripe
>   type cluster/stripe
>   subvolumes brick2 brick3 brick4 brick5 brick6 brick7 brick8
>   option block-size *:32KB
> end-volume
> ### Add iothreads
> volume iothreads
>    type performance/io-threads
>    option thread-count 32  # deault is 1
>    option cache-size 64MB #64MB
>    subvolumes stripe
> end-volume
> ### Add readahead feature
> volume readahead
>   type performance/read-ahead
>   option page-size 256kB     # unit in bytes
> #  option page-count 20       # cache per file  = (page-count x 
> page-size)
>   option page-count 10       # cache per file  = (page-count x page-size)
>   subvolumes iothreads
> end-volume
> ### Add IO-Cache feature
> volume iocache
>   type performance/io-cache
>   option page-size 256KB
> #  option page-size 100MB
>   option page-count 10
>   subvolumes readahead
> end-volume
> ### Add writeback feature
> volume writeback
>   type performance/write-behind
>   option aggregate-size 1MB
>   option flush-behind off
>   subvolumes iocache
> end-volume
>
> ### server config for the 10.0.0.2
>
> # posix
> volume ba
>   type storage/posix
>   option directory /hda/glusterfs-a
> end-volume
> volume bc
>   type storage/posix
>   option directory /hdc/glusterfs-c
> end-volume
> # remote mirror
> volume mc
>   type protocol/client
>   option transport-type tcp/client
>   option remote-host 10.0.0.3 # the next node
>   option remote-subvolume bc
> end-volume
> # join
> volume afr
>         type cluster/afr
>         subvolumes ba mc
> end-volume
> # lock
> volume pl
>   type features/posix-locks
>   subvolumes afr
> end-volume
> # threads
> volume brick
>    type performance/io-threads
>    option thread-count 16  # deault is 1
>    option cache-size 128MB #64MB
>    subvolumes pl
> end-volume
> # export
> volume server
>   type protocol/server
>   option transport-type tcp/server
>   subvolumes brick, bc
>   option auth.ip.brick.allow *
>   option auth.ip.bc.allow *
> end-volume
>
>
>
> # glusterfs --version
> glusterfs 1.3.8 built on Nov 16 2007
> Copyright (c) 2006, 2007 Z RESEARCH Inc. <http://www.zresearch.com>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU 
> General Public License.
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel





More information about the Gluster-devel mailing list