[Gluster-users] rsync for WAN replication (active/active)

Jonathan Barber jonathan.barber at gmail.com
Thu Mar 17 13:23:39 UTC 2011


On 17 March 2011 00:39, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
> I've had several discussions with different set of people about using
> rsync and everyone thinks it's ok to use rsync (2 way) for WAN
> replication in active/active data centers as long as it's done using
> file system mounted on the client. I am sending this out to this user
> list in case anyone sees any problems or better solution. Please
> advice.

By mean 2-way rsync, do you mean periodically running rsync in both datacenters?

If so, there are quite a few issues to do with the exact arguments you
want to give rsync, otherwise you could end up losing data.

Questions to ask:
1) How are you going to trigger an rsync run?
2) If it's inotify (or similar) based, how are you going to stop the
other site from triggering an update?
3) If it's cron, how do you prevent partially transferred files from
clobbering the other site? e.g. site A starts to sync to site B and
starts to transfer a file FOO, site B then starts to sync to site A
and notices the file FOO is different on site B to site A, so
transfers it to site A...
4) How to deal with deletions? If a file isn't present on one site, is
that because it's been deleted, or not been created?
5) How long will it take to scan the filesystem to build a list of
files to sync, if you have lots of small files this could be a
non-trivial amount of time.

I imagine there are more, but these are the first ones I thought of
when I was thinking about how to do this. Of course, it depends on the
shape of your data as to whether you have to worry about some of these
points. But 1-3 were worrying for me - of course you could create a
locking mechanism (first check if rsync is running on the remote node,
and don't run if it is) - but it starts to look increasingly
complicated.

In the end I decided to use GlusterFS with replication and bricks in
both sites, because performance wasn't as important to me as not
having to hack up a sync protocol without application/FS support.
Also, my WAN link is very reliable and reasonably low latency.

Regards
-- 
Jonathan Barber <jonathan.barber at gmail.com>



More information about the Gluster-users mailing list