[Gluster-users] Replication not working on server hang
Stephan von Krawczynski
skraw at ithnet.com
Fri Aug 28 11:32:48 UTC 2009
> Glusterfs log only shows lines like this ones:
> [2009-08-28 09:19:28] E [client-protocol.c:292:call_bail] data2: bailing
> out frame LOOKUP(32) frame sent = 2009-08-28 08:49:18. frame-timeout = 1800
> [2009-08-28 09:23:38] E [client-protocol.c:292:call_bail] data2: bailing
> out frame LOOKUP(32) frame sent = 2009-08-28 08:53:28. frame-timeout = 1800
> Once server2 has been rebooted all gluster fs become available
> again on all clients and the hanged df and ls processes terminate,
> but difficult to understand why a replicated share that must survive
> to failure on one server does not.
You are suffering from the problem we talked about few days ago on the list.
If your local fs produces a deadlock somehow on one server glusterfs is
currently unable to cope with the situation and just _waits_ for things to
come. This deadlocks your clients, too, without any need.
Your experience backs my critics on the handling of these situations.
More information about the Gluster-users