[Gluster-devel] Avoid split-brains on replica-2

Christopher Pereira kripper at imatronix.cl
Sun May 10 19:16:10 UTC 2015


Hi,

Using a replica-2 gluster volume with oVirt is currently not supported 
and causes split-brains, specially due to sanlock.

Replica-1 works fine (since split-brains are impossible here) and 
replica-3 *should* work or at least reduces the probability of suffering 
split-brains (depends on race-conditions?).
Besides, geo-replication allows to replicate a replica-1 volume in order 
to achieve similar results as replica-2.
But since geo-rep uses rsync I guess that it's less optimal than using 
"replica-n" where I guess blocks are marked as dirty to be replicated. 
Does geo-rep do the same?
How does replica-n and geo-rep compare in a continuous replication scenario?
How safe is it to use replica-n or geo-rep for VM images? Will the 
replicated VM images be mostly consistent compared to a bare-metal 
sudden power-off?

My guess is that replica-n is safer than geo-rep since it replicates 
writes synchronically in real-time, while geo-rep seems to do an initial 
scan using rsync, but I'm not sure how it continues replicating after 
that initial sync.

Anyway, I would like to ask, discuss or propose the following idea:
- Have an option to tell gluster to write to only one brick 
(split-brains would be impossible) which will then replicate to other 
bricks.
- A local brick (if exists) should be selected as the "write authority 
brick".

This would increase the global write performance which is currently 
constrained to the slowest node because writes are currently replicated 
synchronically to all other replicas (=> writes are not scalable for 
replica volumes).

Basically, the idea here is to have an option to avoid split-brains 
selecting an authority brick and to avoid sync writes.
The same goal could be achieved by forcing gluster to resolve *all* 
split-brains by choosing the authority brick as the winner (?).

Do we currently have an option for doing something like this?

Benefits for gluster:
- replica-n won't cause no split-brains
- scalability (write performance won't be limited to the slowed node)

Best regards,
Christopher Pereira



More information about the Gluster-devel mailing list