[Gluster-users] HA replica
joe at julianfamily.org
Thu Feb 18 04:03:19 UTC 2016
On 02/12/2016 12:08 PM, Mike Stump wrote:
> Ok. I’m a new user, I want to make an array with 10 machines. I want
> to be able to able to suffer the loss of any one machine. I don’t mind
> wasting 50% of the disk space to do this. I don’t want to suffer split
> brain. I want the array to support both read and write access to data.
> How do I achieve that?
What is your acceptable annual downtime (typically outlined in an SLA or
OLA)? That's a bit of information you should have when you're
engineering a system.
Split-brain happens when your replication has been partitioned and
writes have occurred in such a way that no valid copy can be discerned.
For the sake of example, we're going to use a very simple file entitled
"file.txt" with the contents of "The quick brown fox jumped over the
lazy yellow dog." It exists on a replicated volume with no protection on
a network where a server and client are in the west wing, and the
replica server and another client are in the east wing. Somewhere in the
middle, someone pulls the plug on the router. The west client can see
the west server and the east client can see the east server.
The west client updates file.txt changing the word "brown" to "red". The
east client updates the same file.txt and changes the word "brown" to
The router recovers and the two servers try to synchronize any files
that were changed. They both had changes to file.txt. Which one was right?
There's no way to determine that from the information given. That's
How can you combat split brain?
One solution is quorum. Have enough replica that comparisons can be
made. If two servers are in the west and only one in the east and they
have the ability to determine quorum, the east server will not allow
writes during the network split. It can tell that it's not safe because
if they all three voted on which change was right, the two in the west
would win and data would be lost. The two in the west see that one
server is lost, but they still have quorum. They allow the data to
remain available, knowing that the out-of-quorum server is safe from
Gluster has the ability to have a minimally participating quorum
participant called an arbiter. Let's make the west client an arbiter.
The net split happens. Only the two replica exist, one in west and the
other in east. The arbiter can see the west server but not the east. The
east server can see neither the west server nor the arbiter. The east
loses quorum but the west, seeing the arbiter, does still have quorum
and remains available with the safe understanding that the east server,
not having quorum, will not accept writes.
So with your 10 servers you could have a "replica 3 arbiter 1" volume
with one of the replica being an arbiter. It would only use space for
file names and metadata, but no actual data. If I were doing it, I would
probably do it as so:
gluster volume create myvol replica 3 arbiter 1 server1:/brick1
server2:/brick1 server3:/arbiter \
server3:/brick1 server4:/brick1 server5:/arbiter etc.
Notice how there's both a data directory (/brick1) and an arbiter
directory (/arbiter) on bricks 3,5,7... which allows the data "waste"
that you're asking for while /mostly/ allowing the availability you
seek. I say mostly because if your network partitions, something's got
to give or you will lose data. There's absolutely no way for
disconnected systems to coordinate binary changes to each other with
Perhaps, one day, we will have quantum tunneling networks with
superimposed particles able to teleport data without the need of
networks, but that's not today. When that /is/ available, I expect
rainbows and unicorns to be available as well.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users