[Gluster-users] quorate set [was: How to re-sync]
Ian Rogers
ian.rogers at contactclean.com
Wed Mar 10 18:26:20 UTC 2010
On 10/03/2010 17:41, Ed W wrote:
>
> I do agree that it would be very helpful to have an idea of whether
> servers are properly in sync or not though.
>
> Consider the scenario of upgrading a cluster, ie take down S1, upgrade
> it, then bring it up again, take down S2, upgrade it, then bring it up
> again. If you don't fully sync S1 and S2 in the middle then you have
> a split brain situation which must lead to data loss...
>
> Perhaps the ls -alR is 100% sufficient to guarantee the entire
> filesystem is synced and hence is completely sufficient, but split
> brain IS the major fear with clustered systems and it would be nice to
> have even stronger guarantees of consistency...
[NB I'm on a steep learning curve with gluster so forgive me if I've
missed something in the docs]
As far as I can tell from the docs, gluster has a very naive algorithm
for picking which brick to read from and write to.
For reading it scans the "subvolumes" entry left to right, finding which
brick has the file with the most recent create time the largest
"modification count". It then uses the left most one.
For writing - either user initiated or to self-heal out of date files -
it just writes to all subvolumes that are available.
So, as long as you start out with fully synced mirrors and you only have
one volume served per brick, you should be able to upgrade them "right
to left" and not get any split brain... [fingers crossed :-]
But in most cases [well, my case] I have lots of volumes served from
each brick and list the bricks in different orders in each volume
description to get some kind of very course load balancing, so this
trick is no good for me.
The cluster/replicate xlator could do with some kind of notion of
"quorate" for its subvolumes - e.g. if there's supposed to be 3
subvolumes and I can only communicate with 1 then any write operations
probably should fail as we're not quorate and probably split-brain.
Being able to run something like google's chubby -
http://labs.google.com/papers/chubby.html - lock/"definitive info"
server on the bricks could make "favourite subvolume" selection and
split brain detection much easier.
Ian
--
www.ContactClean.com
Making changing email address as easy as clicking a mouse.
Helping you keep in touch.
More information about the Gluster-users
mailing list