[Gluster-users] The continuing story ...

Stephan von Krawczynski skraw at ithnet.com
Tue Sep 8 09:55:08 UTC 2009

On Tue, 8 Sep 2009 10:13:17 +1000 (EST)
"Jeff Evans" <jeffe at tricab.com> wrote:

> > - server was ping'able
> > - glusterfsd was disconnected by the client because of missing
> > ping-pong - no login possible
> > - no fs action (no lights on the hd-stack)
> > - no screen (was blank, stayed blank)
> This is very similar to what I have seen many times (even back on
> 1.3), and have also commented on the list.
> It seems that we have quite a few ACK's on this, or similar problems.
> The only thing different in my scenario, is that the console doesn't
> stay blank. When attempting to login I get the last login message, and
> nothing more, no prompt ever. Also, I can see that other processes are
> still listening on sockets etc.. so it seems like the kernel just
> can't grab new FD's.
> I too found the hang happens more easily if a downed node from a
> replicate pair re-joins after some time.
> Following suggestions that this is all kernel related, I have just
> moved up to RHEL 5.4 in the hope that the new kernel will
> help.
> This fix stood out as potentially related for me:
> https://bugzilla.redhat.com/show_bug.cgi?id=44543

This is an ext3 fix, unlikely that we run into a similar effect on reiserfs3,
they are really very different in internals and coding.
> We also have a broadcom network card, which had reports of hangs under
> load, the kernel has a patch for that too.

We used tg3 in this setup, but the load was not very high (below 10 MBit on a
1000MBit link). 

> If I still run into the hangs, I'll try xfs.

I doubt that this can be a real solution. My guess is that glusterfsd runs
into some race condition where it locks itself up completely.
It is not funny to debug something the like on a production setup. Best would
be to have debugging output sent from the servers' glusterfsd directly to a
client to save the logs. I would not count on syslog in this case, if it
survives one could use a serial console for syslog output though.
> Thanks, Jeff.


More information about the Gluster-users mailing list