[Gluster-devel] Mysterious Escalating Load

Erik Osterman e at osterman.com
Wed May 2 22:42:27 UTC 2007


Every time we start our rendering applications on our gluster volumes, 
the load starts climbing. At first, we thought it was our application, 
but apparently our application is locked up (more like blocked waiting 
on something). Top shows no active processes (e.g.  load should be next 
to 0). After killing the application, the load continues to climb until 
we terminate and restart the glusterfs process. Glusterfs itself is not 
busy at all. An strace shows it just on epoll_wait. Top shows no 
processes using any cpu, thus it seems like the problem is in the kernel.

load average: 14.99, 14.93, 14.20

Before we had this problem, we were getting consistent kernel panics. 
Applying 
http://www.nabble.com/-fuse-devel--Kernel-oops-in-fuse_send_readpages()-t1374092.html 
fixed those.  We're  stuck to using the 2.6.16 kernel on Amazon's EC2.  
Fuse is version 2.6.3. We've disabled all performance optimizations out 
of desperation to get something working.


Anything I can look for to track this down?

Thanks,

Erik Osterman


# Server config
volume brick0
  type storage/posix
  option directory /mnt/glusterfs/brick0
end-volume

volume server
  type protocol/server
  subvolumes brick0
  option transport-type tcp/server
  option bind-address 0.0.0.0
  option listen-port 6996
  option client-volume-filename /etc/glusterfs/client.vol
  option auth.ip.brick0.allow *
end-volume



# Client config

volume ip0
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.253.59.65
  option remote-port 6996
  option remote-subvolume brick0
end-volume

volume ip1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.253.58.240
  option remote-port 6996
  option remote-subvolume brick0
end-volume

volume ip2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.253.58.239
  option remote-port 6996
  option remote-subvolume brick0
end-volume

volume afr
  type cluster/afr
  subvolumes ip0 ip1 ip2
  option replicate *:2                
end-volume

volume ip
  type cluster/unify
  subvolumes afr
  option scheduler rr
  option rr.limits.min-free-disk 2GB
end-volume






More information about the Gluster-devel mailing list