[Gluster-users] The continuing story ...
Mark Mielke
mark at mark.mielke.cc
Thu Sep 10 19:09:02 UTC 2009
On 09/10/2009 09:38 AM, David Saez Padros wrote:
>
>> In particular, if you read about the intent of FUSE - the technology
>> being used to create a file system, I think you will find that what
>> Anand is saying is the *exact* purpose for this project.
>
> the lockups are on server side not in client side and fuse is
> not used on the server side
I think there is Stephan's problem and your problem, and I'm losing
track over which one is being discussed. Sorry. :-)
Server side, pure user space, with hardware locking up, or the kernel
not be able to use a hardware resource - is a kernel problem. Yes, user
space can trigger it - for example, by opening so many sockets and other
such kernel resources, as to fill low memory - but as we found out
recently, this is where the kernel is supposed to come in and kick the
user program out with an out of memory killer, or not grant the
resources in the first place.
As it is - do we have evidence that GlusterFS is using up large number
of file descriptors, sockets, processes, virtual memory, or other kernel
resource? It seems to me that the failure in the case with the logs was
the kernel finding the CPU not waking up for a long period of time?
I'm not saying ignore GlusterFS in your evaluation - but I am saying if
you truly want a resolution, you really should consider trying the linux
developers, and seeing what they think. If they say this is a GlusterFS
specific problem, I'm sure Anand and gluster.com would take a very
serious second look at it. Until then - they gave it a shot, and don't
have the ability to diagnose your problem or fix your problem. You could
say they are incompetent and uncaring about their users - but a more
accurate statement would probably be that this is entirely out of their
domain, and they are unable to help you, and their professional
recommendation and mine is to contact RedHat if you have a subscription,
or if you do not, try the linux developers.
I have no doubt at all that user space programs can hurt the kernel -
but in every situation I can think of, the problem is really a *kernel*
problem. The user space is just discovering the problem - which is
unfortunate - but honestly, shit happens. We recently dealt with load
builds failing due to the out of memory issue I reference above, as
32-bit linux kernel doesn't work very well with 32 Gbytes of RAM.
Another problem we dealt with was Subversion mod_dav_fs quickly
consuming all virtual memory in the machine, eventually leading to
machine failure. For the Subversion issue - mod_dav_fs or something is
uses should not be continually consuming more memory - so they have a
bug - but the kernel *also* has a bug, because it should not allow httpd
to bring the machine to a halt due to exhausted virtual memory. In the
Subversion case, it's low on our priority list to solve, since we can
work around it by having Apache recycle the process space more
frequently and avoid the symptoms - but we should be taking this to both
the Subversion developers at Collab.net *and* the Linux kernel
developers. (I know what the Linux kernel developers will say though -
32-bit kernel was not designed for 32 Gbytes of RAM, and upgrade to a
64-bit kernel - but we have RHEL subscription, so perhaps we could take
it that route...)
Cheers,
mark
--
Mark Mielke<mark at mielke.cc>
More information about the Gluster-users
mailing list