[Gluster-devel] solutions for split brain situation

Mark Mielke mark at mark.mielke.cc
Mon Sep 14 14:25:40 UTC 2009

On 09/14/2009 08:06 AM, Stephan von Krawczynski wrote:
> we have seen several split brain situations and think that the most common
> option for the situation is simply missing. You can define a favourite child,
> but you cannot define to use the latest file copy as definitive. Why not?
> Isn't it a logical approach to say that the latest copy of a file based on
> mtime must be the most up-to-date and therefore being used in split brain
> recovery?

Latest is *a* resolution, but it's probably not 100% the right answer 
for everybody. I don't think I would use it. If the file system is 
forked - and one client is doing one thing, and another is doing another 
thing - there is no clear answer. Split brain in general is bad. My 
personal conclusion on the matter is:
     1) I want to make sure that only one server is modifying one file 
at one time, and only cut over if the master goes down, *or*
     2) I want to lock a majority of the servers before allowing a 
transaction to start, such that split brain should not occur. For a 
3-node clusters, this means requiring 2 locks.

I don't think I would rely on self-healing of split-brain for a 
production service. Just my opinion.

If I did want to make a "best choice", though - I think I would choose 
"volume associated with the longest running glusterfsd including being 
actively ping accessible". It's not perfect either, but at least it 
maximizes the chance that this is the one the most people using would 
have seen and made their decisions based upon.

> Currently it seems that there is no real choice besides a defined favourite
> child, the file action is only distributed between the children, which means
> you just get a subset of old file copies.
> I'd say the solution has to be placed somewhere at
> xlators/cluster/afr/src/afr-self-heal-data.c lines 855 ff.
> I have no idea though how to find out what the latest copy is ...
> Comments?

Look at the stat() results for each of the files, and track the latest 
mtime. But, for two processes actively writing - this still rolling a 
die. In fact, just because it's latest now, doesn't mean it is latest 2 
seconds from now...


Mark Mielke<mark at mielke.cc>

