[Gluster-users] quorate set [was: How to re-sync]

Ian Rogers ian.rogers at contactclean.com
Wed Mar 10 18:26:20 UTC 2010

On 10/03/2010 17:41, Ed W wrote:
> I do agree that it would be very helpful to have an idea of whether 
> servers are properly in sync or not though.
> Consider the scenario of upgrading a cluster, ie take down S1, upgrade 
> it, then bring it up again, take down S2, upgrade it, then bring it up 
> again.  If you don't fully sync S1 and S2 in the middle then you have 
> a split brain situation which must lead to data loss...
> Perhaps the ls -alR is 100% sufficient to guarantee the entire 
> filesystem is synced and hence is completely sufficient, but split 
> brain IS the major fear with clustered systems and it would be nice to 
> have even stronger guarantees of consistency...

[NB I'm on a steep learning curve with gluster so forgive me if I've 
missed something in the docs]

As far as I can tell from the docs, gluster has a very naive algorithm 
for picking which brick to read from and write to.

For reading it scans the "subvolumes" entry left to right, finding which 
brick has the file with the most recent create time the largest 
"modification count". It then uses the left most one.

For writing - either user initiated or to self-heal out of date files - 
it just writes to all subvolumes that are available.

So, as long as you start out with fully synced mirrors and you only have 
one volume served per brick, you should be able to upgrade them "right 
to left" and not get any split brain... [fingers crossed :-]

But in most cases [well, my case] I have lots of volumes served from 
each brick and list the bricks in different orders in each volume 
description to get some kind of very course load balancing, so this 
trick is no good for me.

The cluster/replicate xlator could do with some kind of notion of 
"quorate" for its subvolumes - e.g. if there's supposed to be 3 
subvolumes and I can only communicate with 1 then any write operations 
probably should fail as we're not quorate and probably split-brain.

Being able to run something like google's chubby - 
http://labs.google.com/papers/chubby.html - lock/"definitive info" 
server on the bricks could make "favourite subvolume" selection and 
split brain detection much easier.


Making changing email address as easy as clicking a mouse.
Helping you keep in touch.

More information about the Gluster-users mailing list