[Gluster-users] The continuing story ...

Thu Sep 10 10:25:20 UTC 2009

On Wed, 09 Sep 2009 19:43:15 -0400
Mark Mielke <mark at mark.mielke.cc> wrote:

> >
> > On Wed, 9 Sep 2009 23:17:07 +0530
> > Anand Avati<avati at gluster.com>  wrote:
> >
> >    
> >> Please reply back to this thread only after you have a response from
> >> the appropriate kernel developer indicating that the cause of this
> >> lockup is because of a misbehaving userspace application. After that,
> >> let us give you the benefit of doubt that the misbehaving userspace
> >> process is glusterfsd and then continue any further debugging. It is
> >> not that we do not want to help you, but we really are pointing you to
> >> the right place where your problem can actually get fixed. You have
> >> all the necessary input they need.
> >>      
> > This is the kind of statement that often drives listeners to think about a
> > project fork...
> >
> >    
> 
> Only if backed up. Has the trace been shown to the linux developers? 
> What do they think?
> 
> If the linux developers come back with "this is totally a userspace 
> program - go away", then yes, it can lead to people thinking about a 
> project fork. But, if the linux developers come back with "crap - yes, 
> this is a kernel program", then I think you might owe Anand an apology 
> for pushing him... :-)
> 
> In this case, there is too many unknowns - but I agree with Anand's 
> logic 100%. Gluster should not be able to cause a CPU lock up. It should 
> be impossible. If it is not impossible - it means a kernel bug, and the 
> best place to have this addressed is the kernel devel list, or, if you 
> have purchased a subscription from a company such as RedHat, than this 
> belongs as a ticket open with RedHat.

You know, I am really bothered about the way the maintainers are acting since
I read this list. There is really a lot of ideology going on ("can't be", "is
impossible for userspace" etc) and very few real debugging.
This application is not the only one in the world. People use heavily file-
and net-acting applications like firefox, apache, shell-scripts, name-one on
their boxes. None leads to effects seen if you play with glusterfs. If you
really think it is a logical way of debugging to go out and simply tell
"userspace can't do that" while the rest of the application-world does not
show up with dead-ends like seen on this list, how can I change your mind?
I hardly believe I can. I can only tell you what I would do: I would try to
document _first_ that my piece of code really does behave well. But as you may
have noticed there is no real way to provide this information. And that is
indeed part of the problem. 
Wouldn't it be a nice step if you could debug the ongoings of a
glusterfs-server on the client by simply reading an exported file (something
like a server-dependant meta-debug-file) that outputs something like strace
does? Something that enables you to say: "Ok, here you can see what the
application did, and there you can see what the kernel made of it". As we
noticed a server-logfile is not sufficient.
Is ideology really a prove for anything in todays' world? Do you really think
it is possible to understand the complete world by seeing half of it and the
other half painted by ideology? What is wrong about _proving_ being not
guilty? About acting defensive ?

It is important to understand that this application is a kind of core
technology for data storage. This means people want to be sure that their
setup does not explode just because they made a kernel update or some other
change where their experience tells them it should have no influence on the
glusterfs service. You want to be sure, just like you are when using nfs. It
just does work (even being in kernel-space!).
Now, answer for yourself if you think glusterfs is as stable as nfs on the
same box. 

> Cheers,
> mark

-- 
Regards,
Stephan