[Gluster-users] GlusterFS 3.1 on Amazon EC2 Challenge

Tue Oct 26 12:23:39 UTC 2010

This is -very- helpful.

So, if I understand you properly, I should focus on scaling -inside-
my EBS devices first.

What I should really do is create a gluster volume that starts with
-lots- of 125 gb EBS device (in my case, 32 to achieve 2 TB of usable
replicated storage).  I should rsync -a to this volume to ensure a
roughly even distribution/replication of files.  As the fullest EBS
device gets to 80%, using snapshot/restore techniques, replace them
with 250 gb EBS devices.  Next time 500, Next time 1 tb.  Then start
over again with 512 125gb EBS devices and another rsync -a, and
repeat.

Because Gluster is a zero metadata system, this should in theory scale
to the horizon, with a quick scriptable upgrade every doubling, and
one painful multi-day transition using rsync -a every 10x.

Does this make sense?  What are the gotchas with this approach?

Thanks for your insights on this!

Gart

On Mon, Oct 25, 2010 at 7:25 PM, Barry Jaspan <barry.jaspan at acquia.com> wrote:
> Gart,
>
> I was speaking generally in my message because I did not know anything about
> your actual situation (maybe because I did not read carefully). From this
> message, I understand your goal to be: You have a "source EBS volume" that
> you would like to replace with a gluster filesystem containing the same
> data. Based on this, my personal recommendation (which carries no official
> weight whatsoever) is:
>
> 1.  On your gluster fileservers, mount whatever bricks you want. It sounds
> you want cluster/distribute over two cluster/replicate volumes over two 1TB
> EBS volumes each, so put two 1TB bricks on each server and export them.
>
> 2. From the machine holding the source EBS volume, mount the gluster bricks
> created in step 1 under a volfile that arranges them under
> cluster/distribute and cluster/replicate as you wish.
>
> 3. rsync -a /source-ebs /mnt/gfs
>
> 4. Switch your production service to use /mnt/gfs.
>
> 5. rsync -a /source-ebs /mnt/gfs again to catch any stragglers. The actual
> details of when/how to run rsync, whether to take down production, etc.
> depend on your service, of course.
>
> On Mon, Oct 25, 2010 at 2:13 PM, Gart Davis <gdavis at spoonflower.com> wrote:
>>
>> My priincipal concerns with this relate to Barry's 3rd bullet: Gluster
>> does not rebalance evenly, and so this solution will eventually bounce
>> off the roof and lock up.
>
> We had a replicate volume. We added distribute on top of it, added a
> subvolume (which was another replicate volume), and used gluster's
> "rebalance" script which consists of removing certain extended attributes,
> renaming files, and copying them back into place. The end result was that
> not very much data got moved to the new volume. Also, that approach to
> rebalancing has inherent race conditions. The best you can do to add more
> storage space to an existing volume is to set your min-free-disk low enough
> (perhaps 80%) so that each time a new file is added that should go to the
> old full brick gluster will instead create a link file on the old brick
> pointing to the new brick, and put the real data on the new brick. This
> imposes extra link-following overhead, but I believe it works.
>
>> Forgive my naivete Barry, when you say 'just use larger replicate
>> volumes instead of distribute', what does that mean?
>
> After our fiasco trying to switch from a single replicate volume to
> distribute over two replicates (having all the problems I just described),
> we just went back to a single replicate volume, and increased our EBS volume
> sizes. They were only 100GB, and we made them 500GB. This worked because EBS
> allows it. If/when we need the bricks to be bigger than 1TB... well I hope
> gluster has improved its capabilities by that point.  If not, we might use
> lvm or whatever on the glusterfs server to make multple ebs volumes look
> like >1TB bricks.
>
> Barry
>
>>
>>  Are you running
>> multiple 1 tb EBS bricks in a single 'replica 2' volume under a single
>> file server?  My recipe is largely riffing off Josh's tutorial.
>> You've clearly found a recipe that you're happy to entrust production
>> data to... how would you change this?
>>
>> Thanks!
>>
>> Gart
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
>
> --
> Barry Jaspan
> Senior Architect | Acquia
> barry.jaspan at acquia.com | (c) 617.905.2208 | (w) 978.296.5231
>
> "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com"
>
>