[Gluster-devel] question on AFR behavior when master is down
Gerry Reno
greno at verizon.net
Fri Jul 20 20:54:50 UTC 2007
Gerry Reno wrote:
> Every so often it is necessary to bring machines down for some type of
> maintenance. If the machine is part of a glusterfs AFR replication
> setup what will happen in the following scenarios?:
>
> master (brick1) is brought down, files are added, changed and deleted
> on glusterfs, master is brought back up. Does the master(brick1)
> resume it's master role? If so, does it sync and correctly
> add/chg/del files to its brick that were modified while it was down?
>
> slave (brick3) is brought down, files are added, changed and deleted
> on glusterfs, slave is brought back up. Since the cluster had a slave
> removed from the middle of the order the replication specified in the
> config may change on other bricks I assume during the down time. Does
> this all get straightened out when this slave returns to the cluster?
> In other words I may replicate over the first three bricks for some
> files therefore while brick3 is down brick4 would actually become the
> third brick in the cluster then. Would it be receiving the
> replication intended for brick3? Again, when brick3 restarts does
> this all get straightened out. In other words, does brick4 get
> cleaned up and unintended files removed that were replicated to it
> when brick3 was down?
>
Following up on this topic, what I think I would like to see with regard
to AFR is that AFR would be able to have two 'master' bricks that would
always contain the same files. As long as glusterfs would have one of
these masters available it is happy and the cluster would operate
normally. If one master gets taken down when it comes back up glusterfs
just syncs it with the other master. This would provide high
availability without changing the behavior of the cluster. From the
standpoint of slaves, once a slave is declared it should be considered
in any operations whether it is up or down. In other words if slave
brick 3 is taken down, do not push files to brick4 just because it is
the third 'reachable' brick. Just skip brick 3 for now until it comes
back up and then resync it. If you really intend to take a brick down
permanently then you must modify the config files to reflect that fact.
my 2c,
Gerry
More information about the Gluster-devel
mailing list