[Gluster-users] Replication not working on server hang

Sun Aug 30 10:21:52 UTC 2009

Hi Avati,

I'm experiencing complete system-wide hangs exactly as David has
mentioned.

> The discussion in this
> thread is about those situations where the server (machine
> hosting the
> storage/posix volume) hangs the backend filesystem (verified by
> kernel console logs) and that in turn results in the mountpoint
> hang.

That seems to be the case in Stephan's situation, yes, as we have
evidence from reiserFS. What evidence have we in the ext3 cases?

> While your symptoms are similar on the client side hanging,

In the case of 144, my systems didn't hang. Maybe I was just lucky.
Now that I have disabled read-ahead to workaround 144, I am seeing
total system hangs. I also saw these hangs back before I used
read-ahead (with 1.3).

As I have said, it is like new FD's cannot be allocated, while those
already open continue normally. I'm talking about regular ext3 mounts
here, not glusterfs ones.

> The discussion thread is about the situation where the server side
> kernel misbehaves and results in glusterfs hanging. The two
> actual problems are quite different.

Perhaps, as I said, it may be coincidence, but when I ran with
read-ahead, I didn't get any system hangs, just the core-dumps.

Now, I don't get core dumps any more. I get system-wide hangs.

Thanks, Jeff.