[Gluster-devel] fail-over taking too long when a node reboots

Wed Jul 27 07:10:58 UTC 2016

hi,
     Does anyone have complete understanding of keepalive timeout vs TCP
User timeout (UTO) options? For both afr and EC when the server reboots it
takes 42 seconds for the fops to fail with ENOTCONN
(saved_frames_unwind()). I am wondering if there is any way to reduce this
time by playing with these two options. As per our earlier research on this
(I think it was kp who did that) keepalive was not getting triggered when
there are fops in progress and he saw quite a few game-dev forums talk
about this problem too. It seems like there is a new timeout called TCP
User timeout which seems to address this. I am wondering if anyone of you
have any experience with this and suggest defaults to be changed for these
timeouts which are more meaningful. I think at the moment default is 42
seconds.

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160727/8c9dab55/attachment-0001.html>