[Gluster-devel] ping timeout

Tue Mar 23 19:23:23 UTC 2010

On 18/03/2010 16:59, Christopher Hawkins wrote:
> I see what you mean. Hopefully that behavior is fixed in 3.0. Though in my case, I would still like fast disconnect because the data mirror is active / passive. There should be no problems for glusterfs to figure out which side has the new data because only one server will be receiving writes at any given time.
>    

I'm not an active Glusterfs user yet, but what worries me about gluster 
is this very casual attitude to split brain...  Other cluster solutions 
take outages extremely seriously to the point they fence off the downed 
server until it's guaranteed back into a synchronised state...

The issue is that once the servers diverge you are just asking for some 
circumstance which will cause the older file to be served (causing data 
loss).  Simple scenario is that one server goes down, files update on 
the second server, then first server comes back up and second server 
goes down, result is out of date files being served...

Once a machine has gone down then it should be fenced off and not be 
allowed to serve files again until it's fully synced - otherwise you are 
just asking for a set of circumstances (however, unlikely) to cause the 
out of date data to be served...

A superb solution would be for the replication tracker to actually log 
and mark dirty anything it can't fully replicate. When the replication 
partner comes back up these could then be treated as a priority sync 
list to get the servers back up to date?

Ed W