[Gluster-devel] pre6 hanging problems

Fri Aug 3 17:18:39 UTC 2007

Hello -

After some experimentation, I have discovered what is causing my client<->
server connections to hang periodically until the server is restarted.

The clients are running on stock CentOS which has an iptables configuration
that only allows traffic to flow on tcp/ip sessions that have been initiated
while that instance of iptables is running (--state ESTABLISHED,RELATED).
So, when iptables was restarted, the return traffic from the glusterfs
server was not reaching the clients even though connectivity was restored
between the two and new tcp/ip sessions worked no problem. What I don't
understand is why the clients think the broken tcp/ip session is still valid
and does not try to reach the server with a new session.

At least adding a rule to iptables to allow all traffic (not just  traffic
related to established connections) across the servers on that port solves
my hanging problem across iptables restarts, but it worries me about real
life situations when the server might disappear (ie unplugged cable, data
center outage, or other lower layer outages).

In the case of these kinds of  network outages, wouldn't a similar situation
be created where the server would not be able to reach the clients. Why
wouldn't the clients create a new tcp/ip connection to the server when they
recognize a timeout for one connection and the server is still responding to
new connections?

Anyway, now that this "stability" issue is solved, I look forward to
experimenting with many wild and wacky combinations of translators.

Thanks for your patience!
:august

On 7/31/07, August R. Wohlt <glusterfs at isidore.net> wrote:
>
> Hi, memory (3g/4g used) and cpu are normal (load of ~1.5) when this
> happens and when i run the server in gdb, it is not captured. I have to
> suspend it to get a backtrace. Similarly, when run outside of gdb, it
> doesn't crash. the server just ends up not responding to either of the
> clients.
>
>
>