[Gluster-devel] Need sensible default value for detecting unclean client disconnects

Wed May 21 14:55:09 UTC 2014

On Tue, May 20, 2014 at 07:15:44PM -0700, Anand Avati wrote:
> Niels,
> This is a good addition. While gluster clients do a reasonably good job at
> detecting dead/hung servers with ping-timeout, the server side detection
> has been rather weak. TCP_KEEPALIVE has helped to some extent, for cases
> where an idling client (which holds a lock) goes dead. However if an active
> client with pending data in server's socket buffer dies, we have been
> subject to long tcp retransmission to finish and give up.
> 
> The way I see it, this option is complementary to TCP_KEEPALIVE (keepalive
> works for idle and only idle connections, user_timeout works only when
> there is pending acknowledgements, thus covering the full spectrum). To
> that end, it might make sense to present the admin a single timeout
> configuration value rather than two. It would be very frustrating for the
> admin to configure one of them to, say, 30 seconds, and then find that the
> server does not clean up after 30 seconds of a hung client only because the
> connection was idle (or not idle). Configuring a second timeout for the
> other case can be very unintuitive.
> 
> In fact, I would suggest to have a single network timeout configuration,
> which gets applied to all the three: ping-timeout on the client,
> user_timeout on the server, keepalive on both. I think that is what a user
> would be expecting anyways. Each is for a slightly different technical
> situation, but all just internal details as far as a user is concerned.
> 
> Thoughts?

Sure, sounds good to me. I was thinking about using the 
network.ping-timeout option for the TCP_USER_TIMEOUT value too. Is that 
what you suggest, and applying that to the TCP_KEEPALIVE setting too?

Thanks,
Niels

> 
> 
> On Tue, May 20, 2014 at 4:30 AM, Niels de Vos <ndevos at redhat.com> wrote:
> 
> > Hi all,
> >
> > the last few days I've been looking at a problem [1] where a client
> > locks a file over a FUSE-mount, and a 2nd client tries to grab that lock
> > too.  It is expected that the 2nd client gets blocked until the 1st
> > client releases the lock. This all work as long as the 1st client
> > cleanly releases the lock.
> >
> > Whenever the 1st client crashes (like a kernel panic) or the network is
> > split and the 1st client is unreachable, the 2nd client may not get the
> > lock until the bricks detect that the connection to the 1st client is
> > dead. If there are pending Replies, the bricks may need 15-20 minutes
> > until the re-transmissions of the replies have timed-out.
> >
> > The current default of 15-20 minutes is quite long for a fail-over
> > scenario. Relatively recently [2], the Linux kernel got
> > a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option
> > can be used to configure a per-socket timeout, instead of a system-wide
> > configuration through the net.ipv4.tcp_retries2 sysctl.
> >
> > The default network.ping-timeout is set to 42 seconds. I'd like to
> > propose a network.tcp-timeout option that can be set per volume. This
> > option should then set TCP_USER_TIMEOUT for the socket, which causes
> > re-transmission failures to be fatal after the timeout has passed.
> >
> > Now the remaining question, what shall be the default timeout in seconds
> > for this new network.tcp-timeout option? I'm currently thinking of
> > making it high enough (like 5 minutes) to prevent false positives.
> >
> > Thoughts and comments welcome,
> > Niels
> >
> >
> > 1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460
> > 2
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> >