[Gluster-devel] Re: Timeout settings and self-healing ? (WAS: HA failover test unsuccessful (inaccessible mountpoint))

Guido Smit guido at comlog.nl
Wed Apr 23 10:47:50 UTC 2008


Krishna,

I did the test. I killed glusterfsd on one server.
All tests (ls, df, cp) worked like it should. I didn't even notice any 
difference. Unplugging the cable however, blocked all operations and 
finally after a few minutes
the transport endpoint message appears.


Krishna Srinivas wrote:
> Guido,
> Do you see the same behavior if you kill one of the server processes
> instead of unplugging the cable?
> Can you "cd" out of glusterfs mount point and "cd" back in after you
> get the first "transport endpoint not connected" and see if you
> still see the error?
> Do you see "transport endpoint" error for all operations you do on
> the mount point?
> Thanks
> Krishna
>
>
> On Tue, Apr 22, 2008 at 1:19 PM, Guido Smit <guido at comlog.nl> wrote:
>   
>>  My server configs:
>>
>>  http://glusterfs.pastebin.com/m3f82f264
>>
>>  One of the client config:
>>  http://glusterfs.pastebin.com/d5df7fab
>>
>>  My problem is, when one of the storage servers is unplugged, I always get
>> the
>>  Transport endpoint is not connected message.
>>
>>
>>
>>
>>  Krishna Srinivas wrote:
>>  Guido,
>>
>> Can you give the setup details, conf files?
>> you can use http://glusterfs.pastebin.com for pasting conf files.
>>
>> Thanks
>> Krishna
>>
>> On Fri, Apr 4, 2008 at 2:40 PM, Anand Avati <avati at zresearch.com> wrote:
>>
>>
>>  Daniel/Guido,
>>  can you paste the logs which are relevant from the time of unplugging the
>>  cable till the end of experiment?
>>
>>  avati
>>
>>  2008/4/3, Daniel Maher <dma+gluster at witbe.net <dma%2Bgluster at witbe.net>>:
>>
>>
>>
>>  > On Thu, 3 Apr 2008 14:55:48 +0530 "Anand Avati" <avati at zresearch.com>
>>  > wrote:
>>  >
>>  > > Daniel,
>>  > > maybe it is just taking long to detect connection failure. Can you
>>  > > try with 'option transport-timeout 20' (sets response timeout to 20
>>  > > seconds) in all your protocol/client and see if you still face the
>>  > > 'hang' ?
>>  >
>>  > My simple test case is as follows :
>>  > 1. Unplug one of the nodes (dfsD)
>>  > 2. Attempt to ls -l the /opt/ (in which gfs-mount/ - the mountpoint -
>>  > is contained)
>>  >
>>  > I set the timeout option along with every client instance in both the
>>  > client and server configs. I tested timeout settings of 10 and 20
>>  > seconds (just to see). In both cases, the 'hang' releases after a while
>>  > (approx 30 seconds), but the results are odd. For example :
>>  >
>>  > # ls -l
>>  > (hang ~ 30 seconds)
>>  > ls: cannot access gfs-mount: Transport endpoint is not connected
>>  > total 0
>>  > d????????? ? ? ? ? ? gfs-mount
>>  >
>>  > # ls -l
>>  > (immediate)
>>  > ls: cannot access gfs-mount: Transport endpoint is not connected
>>  > total 0
>>  > d????????? ? ? ? ? ? gfs-mount
>>  >
>>  > (user wait ~ 5 seconds)
>>  >
>>  > # ls -l
>>  > total 8
>>  > drwxr-xr-x 2 root root 4096 2008-04-03 09:43 gfs-mount
>>  >
>>  > It would appear that the "recovery" time, regardless of whether the
>>  > timeout is set to 10 or 20, is around 35 to 40 seconds - though, at the
>>  > very least, it recovered. Is there any reasonable way to bring this
>>  > period of time down ?
>>  >
>>  > Thank you all so much for your feedback on this topic !
>>  >
>>  >
>>
>>
>> _______________________________________________
>>  Gluster-devel mailing list
>>  Gluster-devel at nongnu.org
>>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Guido Smit
>> ComLog B.V.
>>
>> Televisieweg 133
>> 1322 BE Almere
>> T. 036 5470500
>> F. 036 5470481
>>
>> No virus found in this outgoing message.
>>  Checked by AVG.
>>  Version: 7.5.524 / Virus Database: 269.23.3/1390 - Release Date: 4/21/2008
>> 4:23 PM
>>
>>
>>     
>
>
>   

-- 
Met vriendelijke groet,

Guido Smit
ComLog B.V.

Televisieweg 133
1322 BE Almere
T. 036 5470500
F. 036 5470481

-------------- next part --------------
No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.524 / Virus Database: 269.23.3/1392 - Release Date: 4/22/2008 3:51 PM


More information about the Gluster-devel mailing list