[Gluster-devel] solutions for split brain situation

Fri Sep 18 12:29:51 UTC 2009

On Fri, 18 Sep 2009 10:17:59 +0530
Anand Avati <avati at gluster.com> wrote:

> > Correct me if I am wrong, but GlusterFS uses extended attributes on the
> > directory to note if direct children of the directory have been updated. For
> > instance, if you remove a file and one node is down, self-heal will find
> > that the last directory change on the down node is older than that of the
> > other nodes, bringing any create/unlink operations into line with the other
> > nodes.
> 
> That is correct. This is exactly how things happen today.
> 
> > > This could mean that GlusterFS is too lax with regard to consistency
> > guarantees. If files can appear in the background, and magically be shown -
> > this indicates that GlusterFS is not enforcing use through the mount point,
> > which introduces the potential for inconsistent or faulty results. You are
> > asking for it to guess what you want, without seeing that what you are
> > asking for is incompatible with provisions for any guarantee of a consistent
> > view. That "it works" is actually more concerning to me that justifying over
> > your position. To me it says it's one more potential problem that I might
> > hit in the future. A file that should be removed magically re-appears - how
> > is this a good thing?
> >
> > I guess the last question is a good one for the developers. If the required
> > extended attributes do not exist on the backend, should the
> > files/directories (excluding the root directory) show in a stat() call? That
> > may be a blessing or curse for new users, especially when this post has been
> > going on about automatic creation of extended attributes for pre-existing
> > files in the backend.
> 
> We view the situation from a different point of view. When a file
> appears in the backend without extended attributes, the benefit of
> doubt is given to the user that the file was actually previously
> created from the mountpoint and the extended attributes were lost in
> an fsck - because we ourselves have seen some filesystems just prune
> off extended attributes when running fsck. In fact the very same
> situation where an entire rack power was tripped and both servers
> ended up with files without extended attributes.
> 
> One way to look at this is that the replicate module is being very lax
> about things without considering various scenarios. On the other hand,
> we have carefully analysed various scenarios of network outages,
> server reboots and disk fsck results - at various stages of the
> transaction getting aborted and come to the decisions of whatever
> self-heal does today. Every behavior of self heal (including healing a
> file which got written to the backend directly) is intentional, but
> the rationale might be different from the soft-migration feature or
> feature to add files into the backend directly. Adding files directly
> to the backend is basically misusing the self-heal. If you feel any of
> the self-heal behavior is not supposed to be the way it should be, we
> welcome you to bring it up for discussion.
> 
> The self-healing approach follows a best-effort strategy to fix back
> things wherever it can. Whenever in doubt about a decision which could
> result in data loss, it takes a conservative approach of preserving
> data. For example, when one of your disks is fsck'ed and some file is
> orphan-inode'd and disappears from the disk mount, then this is
> equivalent to rm'ing the file directly from the backend with no traces
> in the parent directectory xattrs. Next time the file is stat()'ed,
> the parent directory journals say everything is consistant, but the
> file is only partially existing. Should the file be deleted or
> recreated? Whenever any such doubts (not just this specific case, but
> any kind of doubt) are existing, glusterfs follows the conservative
> approach of healing the content back on all the servers. Self-heal
> deletions happen only when the extended attributes of the directory
> unambiguously show that the file was supposed to be deleted.
> 
> The side-effect of this 'conservative' approach is that, glusterfs
> appears to support soft-migration - while this was not the intended
> feature. We neither QA nor document this "feature". If you understand
> the implications completely after reading the source, and think that
> this works for you, feel free to migrate like this. Just be aware that
> this is an undocumented feature and any issues/races which might show
> up in the process will be unsupported.
> 
> So if you come up with a scenario where a file which was supposed to
> be gone, but gets recreated, then it is very likely that there is
> another point of view or scenario where the same backend state is
> reached where the file was supposed to exist - and glusterfs self-heal
> takes the conservative approach of not destroying data when a
> potential doubt exists. In simple cases where rm happens with a server
> down, self-heal does indeed delete the file when the down server comes
> back.
> 
> Avati

I can't help being reminded on G'Kar writing the declaration for the
Interstellar Alliance on Babylon 5 :-)
I take your words as a warm welcome to accept your sight of the requested
feature declaration as being an accepted case of backend failure.
It is ok for me if you talk about "accepted case of failure" and me talk about
"local feed feature". As long as we mean the same thing we are one.

-- 
Regards,
Stephan