[Gluster-users] rsync for WAN replication (active/active)

Mohit Anchlia mohitanchlia at gmail.com
Thu Mar 17 17:08:15 UTC 2011


Thanks! I was going to trigger it through cron say every 10 mts. if
rsync is not currently running.

Regarding point 3) I thought of it also! I think this problem cannot
be solved even when using bricks. If someone is editing 2 files at the
same time only one will win (always). Only way we can avoid this is
through application making sure that customer accessing the file can't
go to 2 sites simulatneously. But I agree this scenario is the most
complicated of all.

I was planning to use --temp-dir option (not tested it). Also I think
rsync first copies the file as temporary files and then moves it.

In our case rsync will not handle deletes. If we want to delete any
files it will be done manually.

Thanks again and it will be great to get more suggestions!!

On Thu, Mar 17, 2011 at 6:23 AM, Jonathan Barber
<jonathan.barber at gmail.com> wrote:
> On 17 March 2011 00:39, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>> I've had several discussions with different set of people about using
>> rsync and everyone thinks it's ok to use rsync (2 way) for WAN
>> replication in active/active data centers as long as it's done using
>> file system mounted on the client. I am sending this out to this user
>> list in case anyone sees any problems or better solution. Please
>> advice.
>
> By mean 2-way rsync, do you mean periodically running rsync in both datacenters?
>
> If so, there are quite a few issues to do with the exact arguments you
> want to give rsync, otherwise you could end up losing data.
>
> Questions to ask:
> 1) How are you going to trigger an rsync run?
> 2) If it's inotify (or similar) based, how are you going to stop the
> other site from triggering an update?
> 3) If it's cron, how do you prevent partially transferred files from
> clobbering the other site? e.g. site A starts to sync to site B and
> starts to transfer a file FOO, site B then starts to sync to site A
> and notices the file FOO is different on site B to site A, so
> transfers it to site A...
> 4) How to deal with deletions? If a file isn't present on one site, is
> that because it's been deleted, or not been created?
> 5) How long will it take to scan the filesystem to build a list of
> files to sync, if you have lots of small files this could be a
> non-trivial amount of time.
>
> I imagine there are more, but these are the first ones I thought of
> when I was thinking about how to do this. Of course, it depends on the
> shape of your data as to whether you have to worry about some of these
> points. But 1-3 were worrying for me - of course you could create a
> locking mechanism (first check if rsync is running on the remote node,
> and don't run if it is) - but it starts to look increasingly
> complicated.
>
> In the end I decided to use GlusterFS with replication and bricks in
> both sites, because performance wasn't as important to me as not
> having to hack up a sync protocol without application/FS support.
> Also, my WAN link is very reliable and reasonably low latency.
>
> Regards
> --
> Jonathan Barber <jonathan.barber at gmail.com>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list