[Gluster-devel] Need sensible default value for detecting unclean client disconnects

Tue May 20 16:26:14 UTC 2014

On Tue, May 20, 2014 at 01:30:24PM +0200, Niels de Vos wrote:
> Hi all,
> 
> the last few days I've been looking at a problem [1] where a client 
> locks a file over a FUSE-mount, and a 2nd client tries to grab that lock 
> too.  It is expected that the 2nd client gets blocked until the 1st 
> client releases the lock. This all work as long as the 1st client 
> cleanly releases the lock.
> 
> Whenever the 1st client crashes (like a kernel panic) or the network is 
> split and the 1st client is unreachable, the 2nd client may not get the 
> lock until the bricks detect that the connection to the 1st client is 
> dead. If there are pending Replies, the bricks may need 15-20 minutes 
> until the re-transmissions of the replies have timed-out.
> 
> The current default of 15-20 minutes is quite long for a fail-over 
> scenario. Relatively recently [2], the Linux kernel got 
> a TCP_USER_TIMEOUT socket option (similar to TCP_KEEPALIVE). This option 
> can be used to configure a per-socket timeout, instead of a system-wide 
> configuration through the net.ipv4.tcp_retries2 sysctl.
> 
> The default network.ping-timeout is set to 42 seconds. I'd like to 
> propose a network.tcp-timeout option that can be set per volume. This 
> option should then set TCP_USER_TIMEOUT for the socket, which causes 
> re-transmission failures to be fatal after the timeout has passed.
> 
> Now the remaining question, what shall be the default timeout in seconds 
> for this new network.tcp-timeout option? I'm currently thinking of 
> making it high enough (like 5 minutes) to prevent false positives.
> 
> Thoughts and comments welcome,
> Niels
> 
> 
> 1 https://bugzilla.redhat.com/show_bug.cgi?id=1099460
> 2 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=dca43c7

Posted a patch for review: http://review.gluster.org/7814