Major lock-up problem

Gareth Bult gareth at encryptec.net
Wed Jan 9 15:40:49 UTC 2008


I've been developing a new system (which is now "live", hence the lack of debug information) and have been experiencing lots of inexplicable lock up and pause problems with lots of different components, and I've been working my way through the systems removing / fixing problems as I go. 

I seem to have a problem with gluster I can't nail down. 

When hitting the server with sustained (typically multi-file) writes, after a while the server goes "D" state. 
If I have io-threads running on the server, only ONE process goes "D" state. 

Trouble is, it stays "D" state and starts to lock up other processes .. a favourite is "vi". 

Funny thing is, the machine is a XEN server (glusterfsd in the Dom0) and the XEN instances NOT using gluster are not affected. 
Some of the instances using the glusterfs are affected, depending on whether io-threads is used on the server. 

If I'm lucky, I kill the IO process and 5 mins later the machine springs back to life. 
If I'm not, I reboot. 

Anyone any ideas? 

glfs7 and tla. 


