[Gluster-devel] solutions for split brain situation

Wed Sep 16 08:26:56 UTC 2009

On Tue, 15 Sep 2009 20:06:30 +0530
Anand Avati <avati at gluster.com> wrote:

> > please allow to ask a general question regarding data feed for
> > glusterfs-exported files: Is it valid to bring new files to a subvolume over
> > the local fs and not the glusterfs mounting client?
> > The question targets the very start of a glusterfs setup. Does a user have to
> > start with a (i.e. all) completely empty subvolume(s) or can he (just like
> > with nfs) simply export and already existing bunch of files and dirs?
> > Obviously we are only talking about replicate setups for the moment.
> > Does glusterfs handle a situation correctly where some new file simply shows
> > up on the first subvolume (maybe because the user copied it on the local
> > servers' fs)? Does it need extended attribs set somehow from the very
> > beginning, or can it simply accept a file that has currently none set and just
> > use it?
> 
> You always need to copy in data from the mountpoint. glusterfs does
> not support working over existing data. Because of the additional
> restrictions glusterfs imposes on the backend maintenance, it is just
> not feasible to support concurrent access to the backend as well as
> from the mountpoint.

Can you elaborate what these additional restrictions are like?

> > Keep in mind that the user expects that glusterfs works somehow
> > straight forward just like nfs. If you have to begin with exported subvolumes
> > all being empty you will have a lot more troubles in migration to glusterfs.
> 
> The user is having a wrong expectation. glusterfs has never meant to
> be a seamless replacement for NFS. The purpose of goals of NFS are
> very different from glusterfs. It so happens that glusterfs uses an
> existing disk filesystem as a backend, and so does NFS. But NFS is
> _supposed_ to work fine with concurrent server side access. glusterfs
> on the other hand has never claimed such a support, nor do we intend
> to support this use case anytime soon. As an analogy, accessing the
> backend export of glusterfs directly is to be considered as risky as
> accessing the block layer of a disk filesystem.

That isn't really a valid analogy, because you imply that one wants to write
to random blocks be they used or unused in a disk-layer situation. But the
fact is that the only difference between a file fed over glusterfs-mountpoint
and fed directly are the xattribs. The files' content ist exactly the same.
This probably means that more than 99% of the whole fs content is the same.
So all you have to do inside glusterfs is to notice if a file has been fed by
mountpoint or by local feed.
In the standard case (mountpoint) you just proceed. In case of a recognised
local feed you only have to give it your standard xattribs (just as the file
had been freshly fed over a mountpoint) and start self-heal to distribute it
to other subvolumes (we only look at local feed to primary subvolume, because
other feeds are only a special case of this).
You even have the choice to not open the file over a mountpoint-access as long
as the state is not restored to what you see as glusterfs-like - just as you
say now for a split-brain case.
I really would love to hear some detailed explanation why a local feed should
be impossible to manage.
The question is very important because there will be lots of potential users
that are simply unable to copy their data because of the sheer size and time
this would take - just like us. And there is no good reason for them to even
think of migration knowing that all you have to do is wait for an upcoming
pNFS implementation that surely allows soft migration and parallel use of
other NFS versions. I don't say that pNFS is a better solution for the basic
problem, but it is a possible solution that allows soft migration which is a
really important factor.
More than 30 years in application- and driver-programming have shown one thing
for me: you will not be successful if you don't focus on the users with
maximum expectations. All others are only subsets. Failing only one important
expectation (judged by user, not by progammer) will make your whole project
fail. The simple reason for this is: the world does not wait for your project.
Every day new brillant people initiate new brillant projects. It is a market
with survival of the fittest. Do you remember OS/2? Or GEM? Or Wordstar? Or
Wordperfect? Or Commodore? Or even SCO? I can and I will never forget the
words of a manager I talked to about their fundamental problems with user
expectations: "what do you mean? we are a multi-billion dollar company."
One and a half year later they went bancrupt. No money and no company size
saves you if you fail to listen to expectations people have. 
programmers are like poets, most of them can't judge their own writings.

> [...]
> Avati

-- 
Regards,
Stephan