[Gluster-devel] question on AFR behavior when master is down

Gerry Reno greno at verizon.net
Fri Jul 20 20:54:50 UTC 2007


Gerry Reno wrote:
> Every so often it is necessary to bring machines down for some type of 
> maintenance.  If the machine is part of a glusterfs AFR replication 
> setup what will happen in the following scenarios?:
>
> master (brick1) is brought down, files are added, changed and deleted 
> on glusterfs, master is brought back up.  Does the master(brick1) 
> resume it's master role?  If so, does it sync and correctly 
> add/chg/del files to its brick that were modified while it was down?
>
> slave (brick3) is brought down, files are added, changed and deleted 
> on glusterfs, slave is brought back up.  Since the cluster had a slave 
> removed from the middle of the order the replication specified in the 
> config may change on other bricks I assume during the down time.  Does 
> this all get straightened out when this slave returns to the cluster?  
> In other words I may replicate over the first three bricks for some 
> files therefore while brick3 is down brick4 would actually become the 
> third brick in the cluster then.  Would it be receiving the 
> replication intended for brick3?  Again, when brick3 restarts does 
> this all get straightened out.  In other words, does brick4 get 
> cleaned up and unintended files removed that were replicated to it 
> when brick3 was down?
>

Following up on this topic, what I think I would like to see with regard 
to AFR is that AFR would be able to have two 'master' bricks that would 
always contain the same files.  As long as glusterfs would have one of 
these masters available it is happy and the cluster would operate 
normally.  If one master gets taken down when it comes back up glusterfs 
just syncs it with the other master.  This would provide high 
availability without changing the behavior of the cluster.  From the 
standpoint of slaves, once a slave is declared it should be considered 
in any operations whether it is up or down.  In other words if slave 
brick 3 is taken down, do not push files to brick4 just because it is 
the third 'reachable' brick.  Just skip brick 3 for now until it comes 
back up and then resync it.  If you really intend to take a brick down 
permanently then you must modify the config files to reflect that fact.

my 2c,
Gerry







More information about the Gluster-devel mailing list