[Gluster-users] Exact purpose of network.ping-timeout
Joe Julian
joe at julianfamily.org
Fri Dec 29 05:34:43 UTC 2017
Restarts will go through a shutdown process. As long as the network
isn't actively unconfigured before the final kill, the tcp connection
will be shutdown and there will be no wait.
On 12/28/17 20:19, Sam McLeod wrote:
> Sure, if you never restart / autoscale anything and if your use case
> isn't bothered with up to 42 seconds of downtime, for us - 42 seconds
> is a really long time for something like a patient management system
> to refuse file attachments from being uploaded etc...
>
> We apply a strict patching policy for security and kernel updates, we
> often also load balance between underlying physical hosts and if the
> virtual hosts have lots of storage it can be quicker to let them
> shutdown and start on another host.
>
> So for us, gone are the old Unix days of caring about uptime, a huge
> part of our measurement of success and risk reduction has become how
> quickly we can not just deploy our software / web apps into production
> but also how quickly our platform can be reformed, patched and
> migrated as is effective.
>
> So in reality, I'd probably rolling restart our three node gluster
> clusters every few weeks or so depending on what patches have been
> released etc...
>
> --
> Sam McLeod
> https://smcleod.net
> https://twitter.com/s_mcleod
>
>> On 29 Dec 2017, at 11:08 am, Joe Julian <joe at julianfamily.org
>> <mailto:joe at julianfamily.org>> wrote:
>>
>> The reason for the long (42 second) ping-timeout is because
>> re-establishing fd's and locks can be a very expensive operation.
>> With an average MTBF of 45000 hours for a server, even just a replica
>> 2 would result in a 42 second MTTR every 2.6 years, or 6 nines of uptime.
>>
>> On December 27, 2017 3:17:01 AM PST, Omar Kohl
>> <omar.kohl at iternity.com <mailto:omar.kohl at iternity.com>> wrote:
>>
>> Hi,
>>
>> If you set it to 10 seconds, and a node goes down, you'll see
>> a 10 seconds freez in all I/O for the volume.
>>
>>
>> Exactly! ONLY 10 seconds instead of the default 42 seconds :-)
>>
>> As I said before the problem with the 42 seconds is that a Windows Samba Client will disconnect (and therefore interrupt any read/write operation) after waiting for about 25 seconds. So 42 seconds is too high. In this case it would therefore make more sense to reduce the ping-timeout, right?
>>
>> Has anyone done any performance measurements on what the implications of a low ping-timeout are? What are the costs of "triggering heals all the time"?
>>
>> On a related note I found the extras/hook-scripts/start/post/S29CTDBsetup.sh <http://s29ctdbsetup.sh/> script that mounts a CTDB (Samba) share and explicitly sets the ping-timeout to 10 seconds. There is a comment saying: "Make sure ping-timeout is not default for CTDB volume". Unfortunately there is no explanation in the script, in the commit or in the Gerrit review history (https://review.gluster.org/#/c/7569/,https://review.gluster.org/#/c/8007/) for WHY you make sure ping-timeout is not default. Can anyone tell me the reason?
>>
>> Kind regards,
>> Omar
>>
>> -----Ursprüngliche Nachricht-----
>> Von:gluster-users-bounces at gluster.org
>> <mailto:gluster-users-bounces at gluster.org> [mailto:gluster-users-bounces at gluster.org] Im Auftrag vonlemonnierk at ulrar.net <mailto:lemonnierk at ulrar.net>
>> Gesendet: Dienstag, 26. Dezember 2017 22:05
>> An:gluster-users at gluster.org <mailto:gluster-users at gluster.org>
>> Betreff: Re: [Gluster-users] Exact purpose ofnetwork.ping <http://network.ping/>-timeout
>>
>> Hi,
>>
>> It's just the delay for which a node can stop responding before being marked as down.
>> Basically that's how long a node can go down before a heal becomes necessary to bring it back.
>>
>> If you set it to 10 seconds, and a node goes down, you'll see a 10 seconds freez in all I/O for the volume. That's why you don't want it too high (having a 2 minutes freez on I/O for example would be pretty bad, depending on what you host), but you don't want it too low either (to avoid triggering heals all the time).
>>
>> You can configure it because it depends on what you host. You might be okay with a few minutes freez to avoid a heal, or you might not care about heals at all and prefer a very low value to avoid feezes.
>> The default value should work pretty well for most things though
>>
>> On Tue, Dec 26, 2017 at 01:11:48PM +0000, Omar Kohl wrote:
>>
>> Hi, I have a question regarding the "ping-timeout" option. I
>> have been researching its purpose for a few days and it is
>> not completely clear to me. Especially that it is apparently
>> strongly encouraged by the Gluster community not to change or
>> at least decrease this value! Assuming that I set
>> ping-timeout to 10 seconds (instead of the default 42) this
>> would mean that if I have a network outage of 11 seconds then
>> Gluster internally would have to re-allocate some resources
>> that it freed after the 10 seconds, correct? But apart from
>> that there are no negative implications, are there? For
>> instance if I'm copying files during the network outage then
>> those files will continue copying after those 11 seconds.
>> This means that the only purpose of ping-timeout is to save
>> those extra resources that are used by "short" network
>> outages. Is that correct? If I am confident that my network
>> will not have many 11 second outages and if they do occur I
>> am willing to incur those extra costs due to resource
>> allocation is there any reason not to set ping-timeout to 10
>> seconds? The problem I have with a long ping-timeout is that
>> the Windows Samba Client disconnects after 25 seconds. So if
>> one of the nodes of a Gluster cluster shuts down ungracefully
>> then the Samba Client disconnects and the file that was being
>> copied is incomplete on the server. These "costs" seem to be
>> much higher than the potential costs of those Gluster
>> resource re-allocations. But it is hard to estimate because
>> there is not clear documentation what exactly those Gluster
>> costs are. In general I would be very interested in a
>> comprehensive explanation of ping-timeout and the up- and
>> downsides of setting high or low values for it. Kinds
>> regards, Omar
>> ------------------------------------------------------------------------
>> Gluster-users mailing list Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> ------------------------------------------------------------------------
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171228/4aeed403/attachment.html>
More information about the Gluster-users
mailing list