[Gluster-users] Replicate Over VPN

Brock Nanson brock at nanson.net
Tue Mar 18 20:05:15 UTC 2014


Thanks guys, for your responses!  I get the digest, so I'm going to
cut/paste the juicier bits into one message... And a warning... if some of
my comments suggest I really don't know what I'm doing - well, that could
very well be right.  I'm definitely down the learning curve a way - IT is
not my real job or background.



> ---------- Forwarded message ----------
> From: Alex Chekholko <chekh at stanford.edu>
> To: gluster-users at gluster.org
> Cc:
> Date: Mon, 17 Mar 2014 11:23:15 -0700
> Subject: Re: [Gluster-users] Replicate Over VPN
>
>
> On 03/13/2014 04:50 PM, Brock Nanson wrote:
>
>>
>>  ...
>
>> 2) I've seen it suggested that the write function isn't considered
>> complete until it's complete on all bricks in the volume. My write
>> speeds would seem to confirm this.
>>
>
> Yes, the write will return when all replicas are written.  AKA synchronous
> replication.  Usually "replication" means "synchronous replication".
>

OK, so the replication is bit by bit, real time across all the replicas.
 'Synchronous' meaning 'common clock' in essence.


>
>  Is this correct and is there any way
>> to cache the data and allow it to trickle over the link in the
>> background?
>>
>
> You're talking about asynchronous replication.  Which GlusterFS calls
> "geo-replication".
>

Understood... so this means one direction only in reality, at least until
the nut of doing the replication in both directions can be cracked.
 'Asynchronous' might be a bit of a misdirection though, because it would
suggest (to me at least), communication in *both* directions, but not based
on the same clock.


> ...
>>
>> Geo-replication would seem to be the ideal solution, except for the fact
>> that it apparently only works in one direction (although it was
>> evidently hoped it would be upgraded in 3.4.0 to go in both directions I
>> understand).
>>
>
> So if you allow replication to be delayed, and you allow writes on both
> sides, how would you deal with the same file simultaneously being written
> on both sides.  Which would win in the end?
>

This is the big question of course, and I think the answer requires more
knowledge than I have relating to how the replication process occurs.  In
my unsophisticated way, I would assume that under the hood, gluster would
sound something like this whenever a new file is written to Node A:

1) Samba wants to write a file, I'm awake!
2) Hey Node B, wake up, we're about to start writing some bits
synchronously.  File is called 'junk.txt'.
3) OK, we've both opened that file for writing...
3) Samba, start your transmission.
4) 'write, write, write', in Node A/B perfect harmony
5) Close that file and make sure the file listing is updated.

This bit level understanding is something I don't have.  At some point, the
directory listing would be updated to show the new or updated file.  When
does that happen?  Before or after the file is written?

So to answer your question about which file would be win if simultaneously
written, I need to understand whether simply having the file opened for
writing is enough to take control of it.  That is, can Node A tell Node B
that junk.txt is going to be written, thus preventing Node B from accepting
a local write request?  If this is the case, then gluster would only need
to send enough information from Node A to Node B to indicate the write was
coming and that the file is off limits until further notice.  The write
could occur as fast as possible on the local node, and dribble across the
VPN as fast as the link allows to the other.  So #4 above would be 'write,
write, write as fast as each node reasonably can, but not necessarily in
harmony'.  And if communication was broken during the process, the heal
function would be called upon to sort it out when communication is restored.


>
>> So are there any configuration tricks (write-behind, compression etc)
>> that might help me out?  Is there a way to fool geo-replication into
>> working in both directions, recognizing my application isn't seeing
>> serious read/write activity and some reasonable amount of risk is
>> acceptable?
>>
>>
> You're basically talking about running rsyncs in both directions.  How
> will you handle any file conflicts?
>

Yes, I suppose in a way I am, but not based on a cron job... it would
ideally be a full time synchronization, like gluster does, but without the
requirement of perfect Synchronicity (wasn't that a Police album?).

Assuming my kindergarten understanding above could be applied here, the
file conflicts would presumably only exist if the VPN link went down,
preventing the 'open the file for writing' command to be completed on both
ends.  If the link went down part way through a dribbling write to Node B,
the healing process would presumably have a go at fixing the problem after
the link is reinstated.  If someone wrote to the remote copy during the
outage, the typical heal issues would come into play.


>
> --
> Alex Chekholko chekh at stanford.edu
>
>
> ---------- Forwarded message ----------
> From: Alex Chekholko <chekh at stanford.edu>
> To: gluster-users at gluster.org
> Cc:
> Date: Mon, 17 Mar 2014 15:11:37 -0700
> Subject: Re: [Gluster-users] Replicate Over VPN
> Replying back to list.
>
> I don't know of a currently available clustered filesystem that allows
> bi-directional asynchronous replication.  Even in your case where you can
> have manual curation, what would you want to happen when two humans modify
> the same files at the same time in your two geographic locations?  And
> don't tell us it will never happen.
>

Heh, yes, Murphy is a complete bast*rd, so in spite of the odds associated
with sharing over a million files with 30 people, it would have to happen
eventually.  However, the key here is what we do is reproducible and not as
sensitive as, say, banking data.  If someone hits 'save' and something
pukes, the worst case is they've wasted 10 or 15 minutes of work... which
they can do again.  I absolutely understand why gluster is as bit-for-bit
fanatical about keeping everything identical and correct as it is.  It has
to be for virtually all implementations and wouldn't be considered ready
for the real world if it wasn't.  I just need a lazy mode and a tick box
saying I acknowledge and accept all the risks!


> Synchronous replication works a bit differently everywhere, so you'll just
> want to double-check which is most compatible with your workflow.
>
> In glusterfs, the client talks to all replicas and returns when each
> replica has confirmed it has written the data.
>
> In ceph, the client talks to the master replica, and then that master
> replica forwards the writes to all the other replicas, and then confirms to
> the client that all the replicas are written.
>
>
> For your async use case, how often does the shared data change?  Perhaps
> something like a plain rsync every night would be sufficient?  Or a ZFS
> send/receive if that's faster than rsync?
>

As noted above, the number of users isn't that large.  The data is changing
regularly, but we're only talking about a small number of files.  A
workstation user may be in the same file all day long, doing regular saves
and perhaps opening another file or two for reference once in a while.

The reality is, if I knew when a write was about to happen I could drop the
VPN connection, allow the write to finish on the local machine without VPN
delay (a second or two), then bring up the connection again and let the
heal process look after things.  The user wouldn't see the long delay of
the bit-by-bit save to the other node and the synchronization would happen
in the background.  The few seconds of 'downtime' during a write would be
acceptable to me because the odds of the heal process finding new files on
both ends is incredibly small (in my usage case).  Rsync is something I've
used in the past... but it requires too much supervision to ensure it does
what you really expect it to do.  Or so I've found.  It's better suited for
backups IMHO.


>
> On 03/17/2014 02:58 PM, Carlos Capriotti wrote:
>
>> Being a little bit familiar with Brock's work environment, I think I can
>> clarify on this: they have a human, manual system of avoiding those
>> conflicts. Only one person/geographical group will use the files at a
>> given time.
>>
>> All that matters then is being able to automate the replication/exchange
>> process, so, in this case, the question still needing an answer would
>> be, "is there a way to make geo-rep work both ways ?"
>>
>> Sorry for taking point here, Brock. I thought this would speed up the
>> discussion a tad.
>>
>
No problem Carlos, your help is appreciated!


>
>>
>> On Mon, Mar 17, 2014 at 7:23 PM, Alex Chekholko <chekh at stanford.edu
>> <mailto:chekh at stanford.edu>> wrote:
>>
>>
>>     You're basically talking about running rsyncs in both directions.
>>       How will you handle any file conflicts?
>>
>>
>>     --
>>     Alex Chekholko chekh at stanford.edu <mailto:chekh at stanford.edu>
>>
>>
>
>
> ---------- Forwarded message ----------
> From: Marcus Bointon <marcus at synchromedia.co.uk>
> To: gluster-users <gluster-users at gluster.org>
> Cc:
> Date: Tue, 18 Mar 2014 00:03:03 +0100
> Subject: Re: [Gluster-users] Replicate Over VPN
> On 17 Mar 2014, at 23:11, Alex Chekholko <chekh at stanford.edu> wrote:
>
> > For your async use case, how often does the shared data change?  Perhaps
> something like a plain rsync every night would be sufficient?  Or a ZFS
> send/receive if that's faster than rsync?
>
> (This should really have been in reply to Brock, but I lost his post
> somewhere)
>
> There are some fairly simple solutions for this that may be workable,
> especially if writes are somewhat constrained. If all reads and writes by a
> single client go to the same back-end server, perhaps because of cookie or
> IP-based stickiness, they can cope with longish latency propagating to
> other servers, read-what-you-just-wrote will always succeed, and
> simultaneous writes to the same file are very unlikely. A classic use case
> would be user-uploaded image files for a web server cluster.
>
> Bidirectional rsync has serious issues with deletions. Other systems worth
> looking at include:
> csync2: http://oss.linbit.com/csync2/
> Unison: http://www.cis.upenn.edu/~bcpierce/unison/
> Bsync: https://github.com/dooblem/bsync


I'm actually looking at Unison at the moment as suggested by another mentor
off-list.  I'm not convinced it's the way to go yet though, as you noted...
the rsync is something that makes me nervous in my usage case.

If you want to have a good laugh, ponder what I'm considering now.... ;-)
 Some form of ownCloud *with* gluster...  My thinking (without the benefit
of testing the details yet) is to put ownCloud server on each of the
gluster node boxes, sharing out the gluster-mounted volume.  Sitting next
to each gluster node would be another box, this one running the ownCloud
client (I've seen suggestions that someone has put together a command line
implementation of the client for headless servers) and Samba.  So a user
would read/write/browse on the Samba server and drives, with typical
gigabit LAN performance.  OwnCloud would sync changes back to the gluster
volume and would live with the speed issues of the synchronous replication
over the VPN.  The other gluster node at the far end of the VPN would then
share the changes via ownCloud over to the Samba box sitting next to it.
 Perhaps this could all be done with ownCloud alone, but I really like the
way gluster will maintain the data integrity across the VPN... the problems
I have with it are really BECAUSE it's doing a good job!

I don't know how big a mess the concurrent file use issue would create...
or if this could even function as pondered.  But I'm thinking it would
isolate the workstations from the delay issues of the VPN speed.  And might
be worth testing for sh1t5 and giggles.  (yeah, I really am that desperate
to find a reasonably functional solution!)

Thanks guys!

Brock
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140318/a69fb7f3/attachment.html>


More information about the Gluster-users mailing list