[Gluster-users] The continuing story ...

Stephan von Krawczynski skraw at ithnet.com
Mon Sep 7 14:46:55 UTC 2009

Hello all,

last week we saw our first try to enable something like a real-world
environment on glusterfs fail.
Nevertheless we managed to get a working combination of _one_ server and _one_
client (using a replicate setup with a missing second server).
This setup worked for about 4 days, so yesterday we tried to enable the second
server. Within minutes the first one crashed. Well, really we do not know if
it crashed in its true meaning, the situation looked like this:
- server was ping'able
- glusterfsd was disconnected by the client because of missing ping-pong
- no login possible
- no fs action (no lights on the hd-stack)
- no screen (was blank, stayed blank)

This could also be a user-space hang or cpu busy/looping. We don't know.
The really interesting part is that the server worked for days being single,
but as soon as dual server fs action (obviously in combination with self
healing) started it did not survive 10 minutes.
Of course the second server went on, but we had to stop the whole thing
because the data was not completely healed, so it made no sense to go on with
old copies.
This was glusterfs 2.0.6 with a minimal server setup (storage/posix,
features/locks, performance/io-threads) on a linux kernel
Is there someone out there that experienced something the like? 
Any ideas?


