[Gluster-devel] ping timeout
Jeff Darcy
jdarcy at redhat.com
Tue Mar 23 21:10:05 UTC 2010
On 03/23/2010 03:23 PM, Ed W wrote:
> I'm not an active Glusterfs user yet, but what worries me about gluster
> is this very casual attitude to split brain... Other cluster solutions
> take outages extremely seriously to the point they fence off the downed
> server until it's guaranteed back into a synchronised state...
I'm not sure I'd say the attitude is casual, so much as that it
emphasizes availability over consistency.
> Once a machine has gone down then it should be fenced off and not be
> allowed to serve files again until it's fully synced - otherwise you are
> just asking for a set of circumstances (however, unlikely) to cause the
> out of date data to be served...
This is a very common approach to a very common problem in clustered
systems, but it does require server-to-server communication (which
GlusterFS has historically avoided).
> A superb solution would be for the replication tracker to actually log
> and mark dirty anything it can't fully replicate. When the replication
> partner comes back up these could then be treated as a priority sync
> list to get the servers back up to date?
To put a slight twist on that, it would be nice if clients knew which
servers were still in catch-up mode, and not direct traffic to them
except as part of the catch-up process. That process, in turn, should
be based on precise logging of changes on the survivors so that only an
absolute minimum of files need to be touched. That's kind of a whole
different replication architecture, but IMO it would be better for local
replication and practically necessary for wide-area.
More information about the Gluster-devel
mailing list