[Gluster-devel] Gluster/Fuse Kernel Panic

Tom O'Connor tom.oconnor at assanka.net
Thu Sep 17 15:28:31 UTC 2009


Joe Landman wrote:
> Tom O'Connor wrote:
>> Hi List,
>>
>> We currently have a very irritating problem with Centos 5.3 x86_64 
>> running on a Dell Poweredge SC1435.  The problem is this: We are 
>> experiencing frequent kernel panics while using glusterfs and Fuse. 
>> Across the cluster of servers, we are experiencing roughly 1 panic 
>> every 1-2 days.  This wasn't a problem with earlier servers where we 
>> used Fedora 6.
>> Here's a kernel panic screenshot:
>> http://imagehost.gr/images/c5ad2d5jzgpgoq91v24y.png
>>
>> Here's some general info:
>> Linux server6 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 
>> x86_64 x86_64 x86_64 GNU/Linux
>> Using:
>> fuse-2.7.4-8_10.el5
>> fuse-kmdl-2.6.18-128.1.10.el5-2.7.4-8_10.el5
>> fuse-libs-2.7.4-8_10.el5
>> glusterfs-common-2.0.1-1.el5
>> glusterfs-client-2.0.1-1.el5
>> glusterfs-server-2.0.1-1.el5
>>
>> I've straced glusterfs while it dies, and there's nothing seriously 
>> spurious, just it stops working as soon as the kernel locks up.
>
> This is odd in that I'd expect Centos to be more stable than Fedora.
>
> We just helped another group with something like this, and the 
> solution was to move from 2.0.1 to 2.0.6 (and a few other things as 
> well).
>
>>
>> A little background, Gluster is used to share some directories which 
>> are used by apache to serve files from.
>>
>> I've managed to replicate the live environment inside a virtual 
>> machine, and also to replicate the kernel panic by loading the 
>> virtual machine's apache with ApacheBench, at as few as 3 concurrent 
>> requests, the kernel locks up.
>> However, i have been unable to reproduce this exact behavior on the 
>> live cluster, and have tried up to 10,000 concurrent requests which 
>> max out the network more than anything.
>
> I don't understand ... its crashing on the live server, but not when 
> you run ApacheBench?
Yep, It's crashing under "normal" usage (ie, i can't see any traffic 
spikes at the time of the crash), but if i try and max it out and really 
hammer it with ab, there's no problem at all, slow, but works, and 
doesn't fall over.

>
>> I've tried latest versions of gluster and fuse from development 
>> snapshots and stable releases, I've tried patched versions of fuse 
>
> Which snapshots and which stable releases?  2.0.6 is/was stable at 
> last check.  I wouldn't suggest running production on the dev releases.
I tested primarily on the Virtual Machine, and tried the latest from 
centos repositories, i think that's 2.0.1, and the one from doing a 
direct git clone on the dev repo, so effectively a nightly build from 
about 5 days ago.  (it worked, but still exhibited the same behavior).
>
> What fuse's have you tried, what releases, and do they all exhibit the 
> same behavior?  If so, its probably not glusterfs.
Tried the fuse from latest release from the sourceforge page, and the 
one in centos 5.3, 2.7.4.  Also the latest patched one from Z-research 
that is apparently better for gluster.

>
>> released by Gluster.  Nothing seems to improve this problem.
>
> Another problem we see is the Redhat kernel.  We haven't seen it 
> stable under heavy loads in our storage systems.  We have been 
> building and supporting late model kernels for our units which do 
> handle load without crashing.
>
>
I'm primarily an ubuntu person, and I'll probably test this scenario 
against karmic koala (9.10) where gluster is 2.0.1, currently, Jaunty, 
only has 1.3.x and it's not easily compaitble with our current 
configuration.

>> If anyone has any ideas for further debugging, or other routes for 
>> support.  I'm running out of ideas.
>> Thanks in advance
>>
>> Tom O'Connor
>
>


-- 
Tom O'Connor
--------------------------
Assanka: Every possibility
w: http://www.assanka.net/
t: 0870 085 2038
f: 0871 433 0919
e: tom.oconnor at assanka.net






More information about the Gluster-devel mailing list