[Gluster-devel] dynamic replication

Anand Avati avati at zresearch.com
Sun Apr 15 10:03:26 UTC 2007


Gente,
  here is an idea from George on irc (#gluster) about having a mirror
kind of feature where the replication count varies. Initially you
have, say, 4 way repliction, and when you are almost out of disk
space, fall bck to 2 way replication., more details in the irc log
attached.

regards,
avati



-- 
ultimate_answer_t
deep_thought (void)
{ 
  sleep (years2secs (7500000)); 
  return 42;
}
-------------- next part --------------
-:- Desine [n=Desine at 60-240-210-33.tpgi.com.au] has joined #gluster
<paratai> DDDDDDeeeeesssiiiiiiinnneeeeeee
<Desine> wats up
<Desine> :)
<Desine> I have an idea for gluster.
<Desine> say you have 5 servers configured with gluster
<Desine> each server say has 100gb storage available so that's a total of
          500gb
<Desine> ok now here is the tricky bit
<Desine> I want to have all servers backing up until 500gb
<Desine> so that they all have the same data
<Desine> actually 100gb
<Desine> so replicate up to 100gb
<DynWind> you want the same data replicated on all 5 servers?
<Desine> then when it reaches 100gb the servers go into individual mode
<Desine> ok let me explain with 4 servers it is easier
<Desine> 4 servers 400gb total
<Desine> so each server replicates up to 100gb
<Desine> then when it reaches 100gb
<Desine> they go in dual mode
<Desine> so now 2 server replicate to 200gb
<Desine> then when it reaches 200gb
<Desine> they divide
<Desine> and each stores 100gb
<Desine> understand teh concept?
<Desine> 4 servers with equal data can store only 100gb max
<Desine> if you divide them and have 2 groups
<Desine> then you can store 200gb on each group
<Desine> so 2 machines work as one
<Desine> 2 servers
<DynWind> ok, got it now
<Desine> as the data goes up they divide
<Desine> until you got 4
<Desine> each storing 100gb unique data
<Desine> :)
<Desine> what do you think?
<DynWind> so until usage reaches 100gb, you want 4-way replication, and after
          that you want only 2-way replication
<Desine> yeah
<Desine> but this should be automated
<Desine> :)
<Desine> ok
<DynWind> thinking about how you might do this
<Desine> i think it's tricky but i think it's a cool feature :)
<Desine> one way to do this is
<DynWind> right now it can't be done automatically
<Desine> yes
<DynWind> you'd need to majorly reorganize files when you 'divide'
<Desine> well you can destroy data on 2 machines then start copying 100gb over
<Desine> but before it reaches 100gb :) so you get like 90gb say
<DynWind> yeah
<Desine> so when it hits 90gb you activate double mode
<Desine> it would not be 100gb
<Desine> so you have 40gb to play with :)
<Desine> as a cache
<Desine> to start transfering
<Desine> then you get the client to only read from 2 machiens while this is
          happening
<Desine> :)
<Desine> until the system is divided
<DynWind> we'll be writing some infrastructure code for the 1.4 release, which
          will be used to write things like fsck and glusterfs-defrag
<DynWind> using that framework you'll be able to do this, with a few scripts
          (hopefully)
<DynWind> though doing this on-the-fly would be too much work
<Desine> it would take time
* dtor is back from lunch
<dtor> hey Desine
<Desine> he dtor
<Desine> hey
<Desine> what do you think of my idea?
<dtor> Desine idea sounds good, thinking how to implement it
<dtor> it should be possible without 'moving' files around at all
<Desine> sounds good
<Desine> :)
<dtor> say you start off with 4 servers, S1, S2, S3, S4
<dtor> and you have 4 files, A, B, C D
<dtor> initlally:
<dtor> S1: A B C D
<Desine> yep
<dtor> S2: A B C D
<dtor> S3: A B C D
<dtor> now all four servers are almost full (90-100%)
<dtor> and the fifth file E is about to come
<Desine> yep
<dtor> now you delete files in a pattern from all 4 servers
<dtor> S1: AB
<dtor> S2: A B
<dtor> S3: C D
<dtor> S4: C D
<dtor> and E goes to S1+S2
<dtor> and F goes to S3+S4
<Desine> sounds
<Desine> very smart
<dtor> just be deleting files in a a pattern we can 'spread thin'
<Desine> you pretty much nailed it
<dtor> another extension to the idea is to delete on-demand
<dtor> i mean, when E is about to come, you need not delte A,B on S1,S2 and CD
          on S3,S4.. you can just delte A on S1 and S2
<dtor> and make place for E
<dtor> and when F comes delete C,D on S3,S4 making way for it


More information about the Gluster-devel mailing list