[Gluster-devel] ping timeout

Wed Mar 24 22:10:38 UTC 2010

Hi Chris,

I think you just hit the point why most of us do think that glusterfs is
really a brilliant design, me too (though you might have a different
impression reading my mails on the list).
The design is simple which makes the source relativly short and gives a real
chance to understand what is going on.
And the design solves two of the top problems (i.e. showstoppers) around HA
and cluster setups.
The most important one is the one you mentioned: even if glusterfs has lost
control somehow your data is not dead because it is stored on a local fs
somewhere. And the second one, evenly important, is the mount problem. If you
ever tried to avoid the split brain situation by using a classical fs on a
drbd or netblock raid device you have experienced the failover case when a
backup node tries to re-mount your network drives. 
Because of the superior design glusterfs could have - and should have - some
abilities that no other design is able to implement.
Likely most important is the migration case, because people will accept it
much better if migration is trivial. This is why I formerly stressed the
migration by starting glusterfs simply on top of already existing data, and
not copying it through the mountpoint.
Every other cluster fs can migrate by copying, but that is a mess for loads of
data. So glusterfs can make a big point here.
Another very important point should be fault tolerance. I am talking about
user faults, not faulty data here.
A user should be able to just copy data to the backend server tree at any time
and place and glusterfs should just notice on occasion (when stat'ing such a
file) and handle that in an obviously correct expectable way. The design has
this immanent feature, so it should be useable. And again, no other design can
handle direct interaction with its storage space.
And third, the implementation has big chances for further improvement. It
would be a big step forward if the relevant parts would come as a true kernel
module. The performance boost could be highly significant.

--
Regards,
Stephan

On Wed, 24 Mar 2010 16:08:01 -0400 (EDT)
Christopher Hawkins <chawkins at bplinux.com> wrote:

> Correct me if I'm wrong, but something I would add to this debate is the type of split brain we are talking about. Glusterfs is quite different from GFS or OCFS2 in a key way, in that it is an overlay FS that uses locking to control who writes to the underlying files and how they do it.
> 
> It is not a cluster FS the way GFS is a cluster FS. For example if GFS has split brain, then fencing is the only thing preventing the complete destruction of all data as both nodes (assuming only two) write to the same disk at the same time and utterly destroy the filesystem. But glusterfs is passing writes to EXT3 or whatever, and at worst you get out of date files or lost updates, not a useless partition that used to have your data...
> 
> I think less stringent controls are appropriate in this case, and that GFS / OCFS2 are entirely different animals when it comes to how severe a split brain can be. They MUST be strict about fencing, but with Glusterfs you have a choice about how strict you need it to be. 
> 
> Chris  
> 
> 
> 
> ----- "Jeff Darcy" <jdarcy at redhat.com> wrote:
> 
> > On 03/23/2010 03:23 PM, Ed W wrote:
> > > I'm not an active Glusterfs user yet, but what worries me about
> > gluster 
> > > is this very casual attitude to split brain...  Other cluster
> > solutions 
> > > take outages extremely seriously to the point they fence off the
> > downed 
> > > server until it's guaranteed back into a synchronised state...
> > 
> > I'm not sure I'd say the attitude is casual, so much as that it
> > emphasizes availability over consistency.
> > 
> > > Once a machine has gone down then it should be fenced off and not be
> > 
> > > allowed to serve files again until it's fully synced - otherwise you
> > are 
> > > just asking for a set of circumstances (however, unlikely) to cause
> > the 
> > > out of date data to be served...
> > 
> > This is a very common approach to a very common problem in clustered
> > systems, but it does require server-to-server communication (which
> > GlusterFS has historically avoided).
> > 
> > > A superb solution would be for the replication tracker to actually
> > log 
> > > and mark dirty anything it can't fully replicate. When the
> > replication 
> > > partner comes back up these could then be treated as a priority sync
> > 
> > > list to get the servers back up to date?
> > 
> > To put a slight twist on that, it would be nice if clients knew which
> > servers were still in catch-up mode, and not direct traffic to them
> > except as part of the catch-up process.  That process, in turn,
> > should
> > be based on precise logging of changes on the survivors so that only
> > an
> > absolute minimum of files need to be touched.  That's kind of a whole
> > different replication architecture, but IMO it would be better for
> > local
> > replication and practically necessary for wide-area.
> > 
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel