[Gluster-devel] Timeout settings and self-healing ? (WAS: HA failover test unsuccessful (inaccessible mountpoint))
Daniel Maher
dma+gluster at witbe.net
Thu Apr 3 10:17:14 UTC 2008
On Thu, 3 Apr 2008 14:55:48 +0530 "Anand Avati" <avati at zresearch.com>
wrote:
> Daniel,
> maybe it is just taking long to detect connection failure. Can you
> try with 'option transport-timeout 20' (sets response timeout to 20
> seconds) in all your protocol/client and see if you still face the
> 'hang' ?
My simple test case is as follows :
1. Unplug one of the nodes (dfsD)
2. Attempt to ls -l the /opt/ (in which gfs-mount/ - the mountpoint -
is contained)
I set the timeout option along with every client instance in both the
client and server configs. I tested timeout settings of 10 and 20
seconds (just to see). In both cases, the 'hang' releases after a while
(approx 30 seconds), but the results are odd. For example :
# ls -l
(hang ~ 30 seconds)
ls: cannot access gfs-mount: Transport endpoint is not connected
total 0
d????????? ? ? ? ? ? gfs-mount
# ls -l
(immediate)
ls: cannot access gfs-mount: Transport endpoint is not connected
total 0
d????????? ? ? ? ? ? gfs-mount
(user wait ~ 5 seconds)
# ls -l
total 8
drwxr-xr-x 2 root root 4096 2008-04-03 09:43 gfs-mount
It would appear that the "recovery" time, regardless of whether the
timeout is set to 10 or 20, is around 35 to 40 seconds - though, at the
very least, it recovered. Is there any reasonable way to bring this
period of time down ?
Thank you all so much for your feedback on this topic !
--
Daniel Maher <dma AT witbe.net>
More information about the Gluster-devel
mailing list