[Gluster-devel] GlusterFS AFR not failing over

gordan at bobich.net gordan at bobich.net
Mon Jun 9 13:41:09 UTC 2008


No - this is a different problem. If the transport timeout was the 
problem, the access should return after < 60 seconds, should it not? In 
the case I'm seeing, something goes wrong and the only way to recover is 
to restart glusterfsd on the server(s) _AND_ glusterfs on the clients.

It's kind of hard to reproduce, as I only see it happening about once 
every week or so.

Gordan

On Sat, 7 Jun 2008, Krishna Srinivas wrote:

> Gordon,
>
> Is this the case of transport-timeout being high?
>
> Krishna
>
> On Sat, Jun 7, 2008 at 1:04 AM, Gordan Bobic <gordan at bobich.net> wrote:
>> Hi,
>>
>> I have /home mounted from GlusterFS with AFR, and if one of the servers
>> (secondary) goes away, I cannot log in. sshd tries to read ~/.ssh and bash
>> tries to read ~/.bashrc and this seems to fail - or at least take a very
>> long time to time out and try the remaining server (which verifiably works).
>>
>> I get this sort of thing in the logs:
>>
>> E [tcp-client.c:190:tcp_connect] home2: non-blocking connect() returned: 110
>> (Connection timed out)
>> E [client-protocol.c:4423:client_lookup_cbk] home2: no proper reply from
>> server, returning ENOTCONN
>> C [client-protocol.c:212:call_bail] home2: bailing transport
>>
>> where home2 is the name of the GlusterFS export on the secondary.
>>
>> Is this a known issue or have I managed to trip another error case?
>>
>> Gordan
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>





More information about the Gluster-devel mailing list