[Gluster-devel] AFR Replication

Fri Apr 18 14:52:05 UTC 2008

On Fri, 18 Apr 2008, Christopher Hawkins wrote:

> See:   http://www.gluster.org/docs/index.php/Understanding_AFR_Translator
>
> At the bottom of the page are examples to initiate the sync. To clarify on
> this point and some of your other questions in the splitbrain thread:
>
> Automatic re-sync due to one server being down is not available yet, but is
> coming in the next release with an HA translator. For now you can do a total
> re-sync manually by the method listed above, or allow the cluster re-sync
> itself over time because accessing a file for a read or write will cause
> that file to be synced.

I'm aware of the "read 1 byte/line from a file to sync it" approach. The 
problem I am seeing, however, is that I cannot see files that were created 
on node2 before node1 was started. If I cannot see them, I cannot read 
them to sync them.

I am assuming that this is due to the fact that my underlying FS (Reiser4) 
is having issues with xattrs.

> You don't have to AFR from the client side, but you can. You can also do it
> on the server side, or even both. Part of the beauty of glusterfs is the
> simple building blocks - you can set it up any number of ways. Personally I
> don't think n-fold increases in client bandwidth for mirroring is all that
> bad. How many "mirrors" do you really need??  :-)

Fair. How do I configure server-server mirroring?

> The server AFR's to other servers, then unifies the AFR'd volumes, then
> exports them. The clients mount the export from any given server using round
> robin dns or something similar (probably will be deprecated once the HA
> translator is available).

You mean, have servers AFR as clients, then re-export the AFR-ed volume 
again? GlusterFS on top of GlusterFS?

> That way the client needs only N*1 bandwidth (but
> the servers need N* (num of AFR's)). So if you only need to keep 2x copies
> of your data, you never need more than 2x the bandwidth. And no matter what
> cluster filesystem you use, I can't think of a way to get 2x the files
> without 2x the writes.

Sure, I accept that. I was just asking if there was a way to make the 
additional writes server-side, because servers are few and clients are 
many, so n* the server bandwdth will generally be smaller than 
server*client bandwidth.

> There is no fencing and no quorum - the cluster is essentially stateless,
> which is really great because if you build it right then you can't really
> have a situation where split brain is possible (ok, VERY, very unlikely).

I can see that it's less of an issue than block-level split-brain, because 
this would at most lead to the odd file getting corrupted, whereas 
block-level split-brain would destroy the entire FS very quickly.

> All clients connect on the same port, so if you AFR on the client side, say,
> then it's tough to imagine how one client would be able to write to a server
> while another client would think it was down, and yet would still have
> access to another server on the same network and could write to it. Of
> course if you don't consider these issues at build time, it is possible to
> set yourself up for disaster in certain situations. But that's the case with
> anything cluster related... All in all I think it's a tremendous filesystem
> tool.

I agree, but I don't think split-brain conditions are as few or as 
preventtable as you are implying. Whenever there is more than 1 server, 
split-brain is possible. Especially if, for example, you want 2 mirrored 
servers and each is the client of the mirrored cluster pair. If the 
connection between the servers fails, each server would continue being 
able to see it's own mirrored copy and continue working, thus causing a 
split-brain. File-systems like GFS implement quorum and fencing to prevent 
this situation.

So, although a split-brain is less terminal than with GFS (file corruption 
rather than file system corruption), it is still possible.

Gordan