[Gluster-devel] ping timeout

Wed Mar 24 20:08:01 UTC 2010

Correct me if I'm wrong, but something I would add to this debate is the type of split brain we are talking about. Glusterfs is quite different from GFS or OCFS2 in a key way, in that it is an overlay FS that uses locking to control who writes to the underlying files and how they do it.

It is not a cluster FS the way GFS is a cluster FS. For example if GFS has split brain, then fencing is the only thing preventing the complete destruction of all data as both nodes (assuming only two) write to the same disk at the same time and utterly destroy the filesystem. But glusterfs is passing writes to EXT3 or whatever, and at worst you get out of date files or lost updates, not a useless partition that used to have your data...

I think less stringent controls are appropriate in this case, and that GFS / OCFS2 are entirely different animals when it comes to how severe a split brain can be. They MUST be strict about fencing, but with Glusterfs you have a choice about how strict you need it to be. 

Chris  

----- "Jeff Darcy" <jdarcy at redhat.com> wrote:

> On 03/23/2010 03:23 PM, Ed W wrote:
> > I'm not an active Glusterfs user yet, but what worries me about
> gluster 
> > is this very casual attitude to split brain...  Other cluster
> solutions 
> > take outages extremely seriously to the point they fence off the
> downed 
> > server until it's guaranteed back into a synchronised state...
> 
> I'm not sure I'd say the attitude is casual, so much as that it
> emphasizes availability over consistency.
> 
> > Once a machine has gone down then it should be fenced off and not be
> 
> > allowed to serve files again until it's fully synced - otherwise you
> are 
> > just asking for a set of circumstances (however, unlikely) to cause
> the 
> > out of date data to be served...
> 
> This is a very common approach to a very common problem in clustered
> systems, but it does require server-to-server communication (which
> GlusterFS has historically avoided).
> 
> > A superb solution would be for the replication tracker to actually
> log 
> > and mark dirty anything it can't fully replicate. When the
> replication 
> > partner comes back up these could then be treated as a priority sync
> 
> > list to get the servers back up to date?
> 
> To put a slight twist on that, it would be nice if clients knew which
> servers were still in catch-up mode, and not direct traffic to them
> except as part of the catch-up process.  That process, in turn,
> should
> be based on precise logging of changes on the survivors so that only
> an
> absolute minimum of files need to be touched.  That's kind of a whole
> different replication architecture, but IMO it would be better for
> local
> replication and practically necessary for wide-area.
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel