[Gluster-users] rsync for WAN replication (active/active)

phil cryer phil at cryer.us
Thu Mar 24 18:31:47 UTC 2011


On Thu, Mar 24, 2011 at 12:31 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
> Thanks for pointing that out. I think rsync also has option to sync
> based on time,md5hash and other attributes if I am not wrong. If we
> can preserve time and only sync the most latest file then I think we
> should be ok? What do you think? I can't think of any other option
> other than looking at some other DFS systems. We definitely don't want
> to add remote site in the brick because of the latency that we have.
>
> On Thu, Mar 24, 2011 at 5:31 AM, Jonathan Barber
> <jonathan.barber at gmail.com> wrote:
>> On 17 March 2011 17:08, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>>> Thanks! I was going to trigger it through cron say every 10 mts. if
>>> rsync is not currently running.
>>>
>>> Regarding point 3) I thought of it also! I think this problem cannot
>>> be solved even when using bricks. If someone is editing 2 files at the
>>> same time only one will win (always). Only way we can avoid this is
>>> through application making sure that customer accessing the file can't
>>> go to 2 sites simulatneously. But I agree this scenario is the most
>>> complicated of all.
>>
>> This is a different issue; with gluster locking solves it (obviously
>> the application has to know how to handle locks). Also, and I don't
>> know if gluster supports this, some systems support byte range file
>> locks, so both sites can write to the same file at the same time.
>>
>> The scenario I was trying to describe was a race condition between the
>> rsync processes clobbering your files. I don't think this race
>> condition is removed by using the --temp-dir option (although it
>> probably decreases the window by a large amount). But if you don't run
>> the sync process whilst the remote site is sync'ing to you, then it's
>> not a problem.
>>
>>> I was planning to use --temp-dir option (not tested it). Also I think
>>> rsync first copies the file as temporary files and then moves it.
>>
>> I just thought of another problem; which is that in the worst case you
>> might require twice the amount of storage to sync your data (1x for
>> the old data, 1x for the new data).
>>
>>> In our case rsync will not handle deletes. If we want to delete any
>>> files it will be done manually.

Nice thread, I've heard this come up a few times in regards to
Gluster, and it relates to a project I'm working on. Basically I use a
server/client setup using rsync, with inotify handling the kicking off
once changes are seen. One box acts as the server and all the others
are clients. This way when clients have new or changed files, those
changes are sync'd to the server, but when files are removed on a
client those updates will only be sync'd to the server. A separate
cron job run on the clients does the syncs with the server to learn
about missing files it needs to delete from its own store.

It's definitely a work in progress, but the more people I talk to, the
more I think this is needed. I will have it running on my gluster
cluster soon to sync it with another (non-gluster) cluster in another
country. If interested, or you have better idea :) the project is
hosted here: https://github.com/philcryer/lipsync

Thanks

P
-- 
http://philcryer.com



More information about the Gluster-users mailing list