[Gluster-users] simple AFR setup, one server crashes, entire cluster becomes unusable ?

Mon Dec 8 21:32:15 UTC 2008

At 06:17 AM 12/8/2008, Daniel Maher wrote:
>Stas Oskin wrote:
>
> > Based on my limited knowledge of GlusterFS, the most reliable and
> > recommended way (in wiki) is client-side AFR, where the clients aware of
> > the servers status, and replicate the files accordingly.
>
>I've reviewed the AFR-related sections of the documentation on the wiki...
>http://www.gluster.org/docs/index.php/GlusterFS_Translators_v1.3#Automatic_File_Replication_Translator_.28AFR.29
>http://www.gluster.org/docs/index.php/Understanding_AFR_Translator
>
>Nowhere in those sections is it stated, either directly or implicitly,
>that client-side AFR is more reliable than server-side AFR.  I'm not
>saying that the statement is incorrect, but rather that the
>documentation noted above doesn't seem to suggest that this is the case.

the issue isn't reliability, it's availability.

if a client only talks to one server and that server goes down then 
the client has nothing to 'fail over' to.  however, if the client 
talks to both servers then if one goes down it'll keep talking to the 
other one.

There are costs and benefits to each approach.
server side AFR is handy to insure that the filesystems are in sync, 
so no matter which server a client connects to, it'll have the correct data.
with client side AFR you lend yourself to more configuration problems.
For example.
if client 1 only knows about server 1, it will update files happily 
and no AFR takes place
if client 2 is doing client side AFR between server 1 and server 2, 
then it keeps both servers in sync, and occasionally when it accesses 
a file that client 1 updated on server 1, then client 2 takes the 
responsibility of replicating that file to server 2.

I really think a better approach would be to always have server side 
AFR, and then when a gluster client connects to a server, it's given 
the AFR config, so that it has a 'failover pool' that it can use in 
case it's connection to it's primary server gets interrupted.

Hopefully this will make it into a future version of gluster, because 
I think it will really simplify administration and increase availability.
There could be an option to make the client responsible for the 
replication, but the control and config should be centralized at the 
server, to eliminate cases where some clients are replicating to 
certain servers and not others.

my .02