[Gluster-devel] Spurious disconnections / connectivity loss

Gordan Bobic gordan at bobich.net
Sun Jan 31 13:37:49 UTC 2010


Stephan von Krawczynski wrote:
> On Sun, 31 Jan 2010 00:29:55 +0000
> Gordan Bobic <gordan at bobich.net> wrote:
> 
>> Stephan von Krawczynski wrote:
>>> On Fri, 29 Jan 2010 18:41:10 +0000
>>> Gordan Bobic <gordan at bobich.net> wrote:
>>>
>>>> I'm seeing things like this in the logs, coupled with things locking up 
>>>> for a while until the timeout is complete:
>>>>
>>>> [2010-01-29 18:29:01] E 
>>>> [client-protocol.c:415:client_ping_timer_expired] home2: Server 
>>>> 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting.
>>>> [2010-01-29 18:29:01] E 
>>>> [client-protocol.c:415:client_ping_timer_expired] home2: Server 
>>>> 10.2.0.10:6997 has not responded in the last 42 seconds, disconnecting.
>>>>
>>>> The thing is, I know for a fact that there is no network outage of any 
>>>> sort. All the machines are on a local gigabit ethernet, and there is no 
>>>> connectivity loss observed anywhere else. ssh sessions going to the 
>>>> machines that are supposedly "not responding" remain alive and well, 
>>>> with no lag.
>>> What you're seeing here is exactly what made us increase the ping-timeout to
>>> 120.
>>> To us it is obvious that the keep alive strategy does not cope with minimal
>>> packet loss. On _every_ network you can see packet loss (read the docs of your
>>> switch carefully). We had the impression that the strategy implemented is not
>>> aware of the fact that a lost ping packet is no proof for a disconnected
>>> server but only a hint for a closer look.
>> It sounds like there needs to be more heartbeats/minute. 1 packet per 10 
>> seconds might be a good figure to start with, but I cannot see that even 
>> 1 packet / second would be harmful unless the number of nodes starts to 
>> get very large, and disconnection should be triggered only after some 
>> threshold number (certainly > 1) of those get lost in a row. Are there 
>> options to tune such parameters in the volume spec file?
> 
> Really, if you walk that way you should definitely have tuneable parameters,
> because 1 per second is probably no good idea over (slow) wan. I have found
> none so far ...

Indeed, 1/second over a slow WAN would be OTT, but the performance over 
a WAN (or anything with > 5ms latencies) would be unusable anyway. But 
this is why the heartbeating parameters should be tunable for different 
environments. If the defaults don't work on a local Gb etherhet WAN with 
only 5 nodes on it, they will are never likely to work for any common 
environment.

> Slightly offtopic I would like to ask if you, too, experienced glusterfs using
> a lot more bandwith than a comparable nfs connection on the server network
> side. It really looks a bit like a waste of resources to me...

I haven't noticed bandwidth going "missing", if that's what you mean. I 
do my replication server-side, so the server replicates the writes n-1 
times for n servers, and my cacti graphs are broadly in line with the 
bandwidth usage expected. If I disconnect all the mirrors except the 
server I'm connecting to,the bandwidth usage between the client and the 
server is similar to NFS.

What bandwidth "leakage" are you observing?

Gordan





More information about the Gluster-devel mailing list