[Gluster-devel] question on AFR behavior when master is down

Mon Jul 23 09:04:43 UTC 2007

Gerry Reno a écrit :
> Gerry Reno wrote:
>> master (brick1) is brought down, files are added, changed and deleted
>> on glusterfs, master is brought back up.  Does the master(brick1)
>> resume it's master role?  If so, does it sync and correctly
>> add/chg/del files to its brick that were modified while it was down?
>>

IIRC, there is no such things as "master" and "slave" brick in AFR
translator.

If you set up a 2-replica AFR (with 2 bricks : brick1 and brick2), the
client will create a copy of the accessed file on each brick.

If a brick goes down (let's say : brick1), the client will fail
accessing one of the copies.

When brick1 will come up again, files will be sync on the next open()

>> slave (brick3) is brought down, files are added, changed and deleted
>> on glusterfs, slave is brought back up.  Since the cluster had a slave
>> removed from the middle of the order the replication specified in the
>> config may change on other bricks I assume during the down time.  Does
>> this all get straightened out when this slave returns to the cluster? 
>> In other words I may replicate over the first three bricks for some
>> files therefore while brick3 is down brick4 would actually become the
>> third brick in the cluster then.  Would it be receiving the
>> replication intended for brick3?  Again, when brick3 restarts does
>> this all get straightened out.  In other words, does brick4 get
>> cleaned up and unintended files removed that were replicated to it
>> when brick3 was down?
>>

In the AFR translator configuration, you specify the number of replica
you need and define the bricks on which this replication should be performed

So now if you ask for a 3-replica in an AFR volume made of 4 bricks, I
would assume that the 3rd replica would fail on the 3rd brick and will
not be put on the 4th one.

Please, may someone of the dev team corrects me if I'm wrong on that.

> 
> Following up on this topic, what I think I would like to see with regard
> to AFR is that AFR would be able to have two 'master' bricks that would
> always contain the same files.  As long as glusterfs would have one of
> these masters available it is happy and the cluster would operate
> normally.  If one master gets taken down when it comes back up glusterfs
> just syncs it with the other master.  This would provide high
> availability without changing the behavior of the cluster.  From the
> standpoint of slaves, once a slave is declared it should be considered
> in any operations whether it is up or down.  In other words if slave
> brick 3 is taken down, do not push files to brick4 just because it is
> the third 'reachable' brick.  Just skip brick 3 for now until it comes
> back up and then resync it.  If you really intend to take a brick down
> permanently then you must modify the config files to reflect that fact.
> 

Again, I'm quite confident with the fact that AFR does not use a
master/slave model. In order to achieve High-Availability, there is
plenty of possibilities that are currently proposed and discussed on
this Mailing-List.

Regards,

slelievre at tbs-internet.com           Services to ISP
TBS-internet                   http://www.TBS-internet.com