[Gluster-devel] a union to two stripes to fourteen mirrors...

Kevan Benson kbenson at a-1networks.com
Tue Nov 20 18:21:02 UTC 2007


I think it's more of a RAID 10 or 0+1.  Since AFR is being used, all 
data is being stored twice (at a minimum, for all redundant files). 
Only AFRing 10 of 50 bricks would mean that 25% (10 of 40) bricks were 
redundant, and the data they contained would be available on another 
brick, whether it was a stripe of a file or a full file.  The other 75% 
of the bricks would be single points of failure.

RAID 5 actually uses parity, and GlusterFS doesn't have any method for 
doing parity currently that I know of (although that would be a nice 
translator, or nice addition to the stripe translator).

Onyx wrote:
> Wow, very interesting concept!
> Never thought of it...
> Kind of like a raid5 over a network, right?
> 
> Just thinking out loud now, not sure if this is correct, but...
> - In your setup, any single brick can fail like with raid5
> - If you afr 2 times (3 copies), any 2 bricks can fail like with raid6
> - If you afr n times, any n bricks can fail.
> 
> So you can setup a cluster with 50 bricks, afr 10 times, have a 
> redundancy of 10 bricks, and usable storage space of 40 bricks....
> A complex but very interesting concept!
> ....
> ....AND... We could setup some detection system and other small 
> intelligence in the cluster to start a spare brick with the 
> configuration of the failed brick. BAM, hotspare brick alive, and 
> starting to auto-heal!
> 
> Man Glusterfs is flexible!
> 
> Can someone confirm if my thinking is not way-off here?
> 
> 
> This makes me think of an other young cluster filesystem....
> 
> 
> Jerker Nyberg wrote:
>>
>> Hi,
>>
>> I'm trying out different configurations of GlusterFS. I have 7 nodes 
>> each with two 320 GB disks where 300 GB om each disk is for the 
>> distributed file system.
>>
>> Each node is called N. Every file system is on the server side 
>> mirrored to the other disk on the next node, wrapped around so that 
>> the last node mirrors its disk to the first. invented. The real config 
>> is included in the end of this mail.
>>
>> Pseudodefinitions:
>>
>> fs(1) = a file system on the first disk
>> fs(2) = a file system on the second disk
>> n(I, fs(J)) = the fs J on node I
>> afr(N .. M) = mirror the volumes
>> stripe(N .. M) = stripe the volumes
>>
>> Server:
>>
>> Forw(N) = afr(fs(1), node(N+1, fs(2))
>> Back(N) = afr(fs(2), node(N-1, fs(1))
>>
>> Client:
>>
>> FStr(N .. M) = stripe(n(N, Forw(N)) .. n(N+i, Forw(N+1)) .. n(M, 
>> Forw(M)))
>> BStr(N .. M) = stripe(n(N, Back(N)) .. n(N+i, Back(N+1)) .. n(M, 
>> Back(M)))
>> mount /glusterfs = union(FStr(1 .. 7), BStr(1..7))
>>
>>
>>
>> The goal was to get good performance but also redundancy. But this 
>> setup will not will it? The stripes will not work when a part of is 
>> gone and the union will not not magically find the other part of a 
>> file on the other stripe? And where to put the union namespace for 
>> good performance?
>>
>> But my major question is this: I tried to stripe a single stripe (not 
>> using union on the client, just striping on the servers which in turn 
>> mirrored) When rsync'ing in data on it on a single server things 
>> worked fine, but when I put some load on it from the other nodes 
>> (dd'ing in and out some large files) the glusterfsd's on the first 
>> server died... Do you want me to check this up more and try to 
>> reproduce and narrow down the problem, or is this kind of setup in 
>> general not a good idea?
>>
>> Regards
>> Jerker Nyberg.
>>
>> ### client config
>>
>> # remote slices
>> volume brick2
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host 10.0.0.2
>>   option remote-subvolume brick
>> end-volume
>> volume brick3
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host 10.0.0.3
>>   option remote-subvolume brick
>> end-volume
>> volume brick4
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host 10.0.0.4
>>   option remote-subvolume brick
>> end-volume
>> volume brick5
>>   type protocol/client
>>   option transport-type tcp/client     # for TCP/IP transport
>>   option remote-host 10.0.0.5
>>   option remote-subvolume brick
>> end-volume
>> volume brick6
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host 10.0.0.6
>>   option remote-subvolume brick
>> end-volume
>> volume brick7
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host 10.0.0.7
>>   option remote-subvolume brick
>> end-volume
>> volume brick8
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host 10.0.0.8
>>   option remote-subvolume brick
>> end-volume
>> volume stripe
>>   type cluster/stripe
>>   subvolumes brick2 brick3 brick4 brick5 brick6 brick7 brick8
>>   option block-size *:32KB
>> end-volume
>> ### Add iothreads
>> volume iothreads
>>    type performance/io-threads
>>    option thread-count 32  # deault is 1
>>    option cache-size 64MB #64MB
>>    subvolumes stripe
>> end-volume
>> ### Add readahead feature
>> volume readahead
>>   type performance/read-ahead
>>   option page-size 256kB     # unit in bytes
>> #  option page-count 20       # cache per file  = (page-count x 
>> page-size)
>>   option page-count 10       # cache per file  = (page-count x page-size)
>>   subvolumes iothreads
>> end-volume
>> ### Add IO-Cache feature
>> volume iocache
>>   type performance/io-cache
>>   option page-size 256KB
>> #  option page-size 100MB
>>   option page-count 10
>>   subvolumes readahead
>> end-volume
>> ### Add writeback feature
>> volume writeback
>>   type performance/write-behind
>>   option aggregate-size 1MB
>>   option flush-behind off
>>   subvolumes iocache
>> end-volume
>>
>> ### server config for the 10.0.0.2
>>
>> # posix
>> volume ba
>>   type storage/posix
>>   option directory /hda/glusterfs-a
>> end-volume
>> volume bc
>>   type storage/posix
>>   option directory /hdc/glusterfs-c
>> end-volume
>> # remote mirror
>> volume mc
>>   type protocol/client
>>   option transport-type tcp/client
>>   option remote-host 10.0.0.3 # the next node
>>   option remote-subvolume bc
>> end-volume
>> # join
>> volume afr
>>         type cluster/afr
>>         subvolumes ba mc
>> end-volume
>> # lock
>> volume pl
>>   type features/posix-locks
>>   subvolumes afr
>> end-volume
>> # threads
>> volume brick
>>    type performance/io-threads
>>    option thread-count 16  # deault is 1
>>    option cache-size 128MB #64MB
>>    subvolumes pl
>> end-volume
>> # export
>> volume server
>>   type protocol/server
>>   option transport-type tcp/server
>>   subvolumes brick, bc
>>   option auth.ip.brick.allow *
>>   option auth.ip.bc.allow *
>> end-volume
>>
>>
>>
>> # glusterfs --version
>> glusterfs 1.3.8 built on Nov 16 2007
>> Copyright (c) 2006, 2007 Z RESEARCH Inc. <http://www.zresearch.com>
>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>> You may redistribute copies of GlusterFS under the terms of the GNU 
>> General Public License.
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> .
> 


-- 

-Kevan Benson
-A-1 Networks





More information about the Gluster-devel mailing list