[Gluster-users] Using gluster for a CDN

Wed May 6 09:25:04 UTC 2009

>
> IIRC, a lot of the 2.0 release candidates had data corruption problems
> with the replicate translator. I'd suggest checking the mailing list
> archive to see if these issues are fixed.
>

Ok, thanks.

>> I'm currently building a CDN in order to serve static files all over
>> the world for my services.
>> The DNS and load-balancing part is complete, but I'm currently stuck
>> at the point of synchronizing and spreading the files.
>
> rsync+cron? If the files don't change to often, that might be enough.

Yes this could be a simple solution for some projects, but I actually
want the file to be instantly at many places once it has been uploaded
at one place.

> drbd ( http://www.drbd.org/ )? Might be suitable if used as backup
> only (each set of disks is read/written at one location and replicated
> to off site backup).
>

Yes of course, but this is not really what I'm looking for.

>
> AFAIK, 'replicate' is - by design - sensitive to network latency: it
> talks to all mirrors to find out which one has the latest version of a
> file.
> IMHO, that would not work well in your case if every access takes
> 500ms round trip time (rough estimate for "half around the world", not
> based on any tests) before even opening the file.
> http://www.gluster.org/docs/index.php/Understanding_AFR_Translator
>

Well, the fact that the replicate part could take 500ms, is perfectly fine.
The file gets uploaded at one place, and then gets instantly
replicated at many places... it might take time, but it's fine.
I don't have a lot of write operations.

But what I would like is the read operations to use the closest server
from them (by configuration of course, not by magic).

> Using it as a "staging area" for updates might work:
> 1. write changes to replicated gluster volume (from any location)
> 2. sync from gluster volume to local disk (the one that feeds the web servers)
>

This is quite I have in mind, but I would like to avoid the 'sync'
part, and directly use gluster, like :
1) write changes to replicated gluster volume (from any location).
2) read from closest gluster volume (from datacenter).

This is an architecture that is perfectly feasible with an SQL for
example in a master/slave configuration.
1) write data to the master
2) replicate to all slaves all around the world
3) read from the closest slave.

See what I mean ?

Thanks a lot,

Ugo

-- 
"Above, the stars are spiralling
and Heaven, Earth are roaming in a spin"