[Gluster-users] HA replica

Sun Feb 14 02:21:55 UTC 2016

On 02/13/2016 01:02 AM, Mike Stump wrote:
> On Feb 12, 2016, at 8:34 AM, Ravishankar N <ravishankar at redhat.com> wrote:
>> Consistency, availability, tolerance to network partitions. You get to pick any two.
> I wanted the first two.  I did not get them.  By default, we get split brain.  This means no consistency.
consistency means the client always gets back the same data it wrote to 
the volume. For replication, if say the write succeed only on one brick, 
then further reads will be served from the healthy brick and not 
accidentally from the stale one. It also means if one client updated the 
file, other clients also get to see the same update when they access it.

> To cure that, we choose quorums.  But when the first of a replica 2 pair goes away, you then loose write access.  Without write, we loose availability.  So, if you think it is possible, let me know how to reconfigure my array and I will tell you if it worked.  If you could update the docs to explain how you get the first two, that would be nice.  If you could update the docs to state that the array goes into a partial read-only state if a replica pair goes away, that would be nice.
Like Bishoy said in another thread, quorum does not really make sense in 
2-replica  because there is no notion of majority. If you use a 3 way 
replica with client-quorum enabled, then you have more availability than 
a 2 way replica. If preventing split-brains is your major concern while 
not wanting to use 3x replication, you can try arbiter volumes. 
(https://gluster.readthedocs.org/en/release-3.7.0/Features/afr-arbiter-volumes/)
>
> I’m fine with running in a degraded state when a server goes away.  When it comes back, I want it to suck down all the new changes from the authoritative replica pair known to the quorum and then once it has all the data, then it can be marked as not-degraded and resume normal operation.
>
> I want each node to notice a down server, and when it is part of a 51% partition, I want the remaining replica members of that server to become degraded replica N-1 set.  When the server comes back up, and want it to repair back into a replica N state.
AFR does all this but in a distributed synchronous replication system, 
no matter what the replication factor is, at some point, *preventing*  
split-brains means failing further writes if the current write would 
make the only true copy not true anymore. This fencing will be done 
until the other copies are in sync (i.e. healed) . That *will* mean a 
loss of availability (for writes) until the duration of heal.

About the docs, could you list the links for client and server quorum 
where you found the details to be inadequate? I can't seem to find 
anything myself on readthedocs.:(
I'm anyway planning to do a detailed write up for arbiter volumes, 
split-brains, client and server quorums which can serve as a ready reckoner.

HTH,
Ravi