[Gluster-devel] solutions for split brain situation

Tue Sep 15 12:57:06 UTC 2009

On Mon, 14 Sep 2009 23:46:16 -0400
Mark Mielke <mark at mark.mielke.cc> wrote:

> On 09/14/2009 07:28 PM, Stephan von Krawczynski wrote:
> > I have problems in understanding exactly the heart of your question. Since the
> > main reason for rsyncing the data was to take a backup of the primary server,
> > so that self heal would have less to do it is obvious (to me) that it has been
> > a subvolume of the replicate. In fact is was a backup of the _only_ subvolume
> > (remember we configured a replicate with two servers, where one of them was
> > actually not there until we offline fed it with the active servers' data and
> > then tried to switch it online in glusterfs.
> >    
> 
> A potentially valid question here - if the backend storage was a 
> database as other solutions use, would you expect this to work?

No, of course not.

> To some degree, rsync from backup is opening up the black box and 
> shoving stuff in that you think, in theory, should work.

No, not really. In fact every other comment about glusterfs(d) reads like
"this is a standard application regarding the fs, therefore it cannot be
responsible for problem A or bug B". Now, if it is to be judged as one of many
applications on the one hand, then it should be able to cope with situations
that every standard application can cope with either - other applications
using the same fs.
_The_ advantage of the whole glusterfs concept is exactly that it is _no_ fs
with a own and special disk layout. It (should) run(s) on top of an existing
fs that can be used just like a fs may be used - including backup (with rsync
or whatever), restore and file operations of any kind.
If subvolumes are indeed closed storages then they would be in no way
different than nbd, enbd, whatever-nbd. For various reasons we don't want
these solutions.
So, if you did not backup with rsync -X to preserve xattrs you shall loose
some consistency parameters. But you should not loose your ability to restore
the data as a whole.
At least these are my expectations in straight-forward use of glusterfs.

> I don't think this is really the definition of self-heal. I think of 
> self-heal is repairing damage. Really, you are sending it all new data 
> (extracting from a lossy backup copy) that happens to indirectly inherit 
> from previous data, happens to use the same path names, and asking it to 
> reconcile the differences. What is being saved here?

Time and network bandwidth

> Even in a self-heal 
> situation - it's still going to have to re-write the files, unless it is 
> able to detect that some of the leading blocks are in common and only 
> send the diffs? The files really are different.

If a file has the same path and the same name it has to be the same file -
based on the fact that there is no versioning. It may have other content, but
that is exactly what we are talking about. The simple question: what is the
valid content? I don't want glusterfs to guess or randomize, I want to be able
to say: use the copy with the latest mtime. That's about all. I don't expect
versioning nor other neat features. Just like I can say now: use the copy from
child X. Unfortunately this option brings you in trouble if child X dies,
because you have to reconfigure all clients to save the situation. If you
could say: hey just always use the latest file version you can find, at least
the classical server failures are safe. It is even save to restore old backups
to bricks and include them in replication because glusterfs will update them
(maybe the word "update" fits better than "self-heal" here).
Of course it cannot save you in a real split brain situation. Because there is
really nothing that saves you. But all other "standard" downs of a brick look
bright afterwards. Not?

-- 
Regards,
Stephan