[Gluster-devel] Gluster/Fuse Kernel Panic

Joe Landman landman at scalableinformatics.com
Thu Sep 17 15:16:54 UTC 2009


Tom O'Connor wrote:
> Hi List,
> 
> We currently have a very irritating problem with Centos 5.3 x86_64 
> running on a Dell Poweredge SC1435.  The problem is this: We are 
> experiencing frequent kernel panics while using glusterfs and Fuse. 
> Across the cluster of servers, we are experiencing roughly 1 panic every 
> 1-2 days.  This wasn't a problem with earlier servers where we used 
> Fedora 6.
> Here's a kernel panic screenshot:
> http://imagehost.gr/images/c5ad2d5jzgpgoq91v24y.png
> 
> Here's some general info:
> Linux server6 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 
> x86_64 x86_64 x86_64 GNU/Linux
> Using:
> fuse-2.7.4-8_10.el5
> fuse-kmdl-2.6.18-128.1.10.el5-2.7.4-8_10.el5
> fuse-libs-2.7.4-8_10.el5
> glusterfs-common-2.0.1-1.el5
> glusterfs-client-2.0.1-1.el5
> glusterfs-server-2.0.1-1.el5
> 
> I've straced glusterfs while it dies, and there's nothing seriously 
> spurious, just it stops working as soon as the kernel locks up.

This is odd in that I'd expect Centos to be more stable than Fedora.

We just helped another group with something like this, and the solution 
was to move from 2.0.1 to 2.0.6 (and a few other things as well).

> 
> A little background, Gluster is used to share some directories which are 
> used by apache to serve files from.
> 
> I've managed to replicate the live environment inside a virtual machine, 
> and also to replicate the kernel panic by loading the virtual machine's 
> apache with ApacheBench, at as few as 3 concurrent requests, the kernel 
> locks up.
> However, i have been unable to reproduce this exact behavior on the live 
> cluster, and have tried up to 10,000 concurrent requests which max out 
> the network more than anything.

I don't understand ... its crashing on the live server, but not when you 
run ApacheBench?

> I've tried latest versions of gluster and fuse from development 
> snapshots and stable releases, I've tried patched versions of fuse 

Which snapshots and which stable releases?  2.0.6 is/was stable at last 
check.  I wouldn't suggest running production on the dev releases.

What fuse's have you tried, what releases, and do they all exhibit the 
same behavior?  If so, its probably not glusterfs.

> released by Gluster.  Nothing seems to improve this problem.

Another problem we see is the Redhat kernel.  We haven't seen it stable 
under heavy loads in our storage systems.  We have been building and 
supporting late model kernels for our units which do handle load without 
crashing.


> If anyone has any ideas for further debugging, or other routes for 
> support.  I'm running out of ideas.
> Thanks in advance
> 
> Tom O'Connor


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615





More information about the Gluster-devel mailing list