[Gluster-users] Exact purpose of network.ping-timeout

Joe Julian joe at julianfamily.org
Fri Dec 29 05:34:43 UTC 2017


Restarts will go through a shutdown process. As long as the network 
isn't actively unconfigured before the final kill, the tcp connection 
will be shutdown and there will be no wait.


On 12/28/17 20:19, Sam McLeod wrote:
> Sure, if you never restart / autoscale anything and if your use case 
> isn't bothered with up to 42 seconds of downtime, for us - 42 seconds 
> is a really long time for something like a patient management system 
> to refuse file attachments from being uploaded etc...
>
> We apply a strict patching policy for security and kernel updates, we 
> often also load balance between underlying physical hosts and if the 
> virtual hosts have lots of storage it can be quicker to let them 
> shutdown and start on another host.
>
> So for us, gone are the old Unix days of caring about uptime, a huge 
> part of our measurement of success and risk reduction has become how 
> quickly we can not just deploy our software / web apps into production 
> but also how quickly our platform can be reformed, patched and 
> migrated as is effective.
>
> So in reality, I'd probably rolling restart our three node gluster 
> clusters every few weeks or so depending on what patches have been 
> released etc...
>
> --
> Sam McLeod
> https://smcleod.net
> https://twitter.com/s_mcleod
>
>> On 29 Dec 2017, at 11:08 am, Joe Julian <joe at julianfamily.org 
>> <mailto:joe at julianfamily.org>> wrote:
>>
>> The reason for the long (42 second) ping-timeout is because 
>> re-establishing fd's and locks can be a very expensive operation. 
>> With an average MTBF of 45000 hours for a server, even just a replica 
>> 2 would result in a 42 second MTTR every 2.6 years, or 6 nines of uptime.
>>
>> On December 27, 2017 3:17:01 AM PST, Omar Kohl 
>> <omar.kohl at iternity.com <mailto:omar.kohl at iternity.com>> wrote:
>>
>>     Hi,
>>
>>         If you set it to 10 seconds, and a node goes down, you'll see
>>         a 10 seconds freez in all I/O for the volume. 
>>
>>
>>     Exactly! ONLY 10 seconds instead of the default 42 seconds :-)
>>
>>     As I said before the problem with the 42 seconds is that a Windows Samba Client will disconnect (and therefore interrupt any read/write operation) after waiting for about 25 seconds. So 42 seconds is too high. In this case it would therefore make more sense to reduce the ping-timeout, right?
>>
>>     Has anyone done any performance measurements on what the implications of a low ping-timeout are? What are the costs of "triggering heals all the time"?
>>
>>     On a related note I found the extras/hook-scripts/start/post/S29CTDBsetup.sh <http://s29ctdbsetup.sh/>  script that mounts a CTDB (Samba) share and explicitly sets the ping-timeout to 10 seconds. There is a comment saying: "Make sure ping-timeout is not default for CTDB volume". Unfortunately there is no explanation in the script, in the commit or in the Gerrit review history (https://review.gluster.org/#/c/7569/,https://review.gluster.org/#/c/8007/) for WHY you make sure ping-timeout is not default. Can anyone tell me the reason?
>>
>>     Kind regards,
>>     Omar
>>
>>     -----Ursprüngliche Nachricht-----
>>     Von:gluster-users-bounces at gluster.org
>>     <mailto:gluster-users-bounces at gluster.org>  [mailto:gluster-users-bounces at gluster.org] Im Auftrag vonlemonnierk at ulrar.net <mailto:lemonnierk at ulrar.net>
>>     Gesendet: Dienstag, 26. Dezember 2017 22:05
>>     An:gluster-users at gluster.org <mailto:gluster-users at gluster.org>
>>     Betreff: Re: [Gluster-users] Exact purpose ofnetwork.ping <http://network.ping/>-timeout
>>
>>     Hi,
>>
>>     It's just the delay for which a node can stop responding before being marked as down.
>>     Basically that's how long a node can go down before a heal becomes necessary to bring it back.
>>
>>     If you set it to 10 seconds, and a node goes down, you'll see a 10 seconds freez in all I/O for the volume. That's why you don't want it too high (having a 2 minutes freez on I/O for example would be pretty bad, depending on what you host), but you don't want it too low either (to avoid triggering heals all the time).
>>
>>     You can configure it because it depends on what you host. You might be okay with a few minutes freez to avoid a heal, or you might not care about heals at all and prefer a very low value to avoid feezes.
>>     The default value should work pretty well for most things though
>>
>>     On Tue, Dec 26, 2017 at 01:11:48PM +0000, Omar Kohl wrote:
>>
>>         Hi, I have a question regarding the "ping-timeout" option. I
>>         have been researching its purpose for a few days and it is
>>         not completely clear to me. Especially that it is apparently
>>         strongly encouraged by the Gluster community not to change or
>>         at least decrease this value! Assuming that I set
>>         ping-timeout to 10 seconds (instead of the default 42) this
>>         would mean that if I have a network outage of 11 seconds then
>>         Gluster internally would have to re-allocate some resources
>>         that it freed after the 10 seconds, correct? But apart from
>>         that there are no negative implications, are there? For
>>         instance if I'm copying files during the network outage then
>>         those files will continue copying after those 11 seconds.
>>         This means that the only purpose of ping-timeout is to save
>>         those extra resources that are used by "short" network
>>         outages. Is that correct? If I am confident that my network
>>         will not have many 11 second outages and if they do occur I
>>         am willing to incur those extra costs due to resource
>>         allocation is there any reason not to set ping-timeout to 10
>>         seconds? The problem I have with a long ping-timeout is that
>>         the Windows Samba Client disconnects after 25 seconds. So if
>>         one of the nodes of a Gluster cluster shuts down ungracefully
>>         then the Samba Client disconnects and the file that was being
>>         copied is incomplete on the server. These "costs" seem to be
>>         much higher than the potential costs of those Gluster
>>         resource re-allocations. But it is hard to estimate because
>>         there is not clear documentation what exactly those Gluster
>>         costs are. In general I would be very interested in a
>>         comprehensive explanation of ping-timeout and the up- and
>>         downsides of setting high or low values for it. Kinds
>>         regards, Omar
>>         ------------------------------------------------------------------------
>>         Gluster-users mailing list Gluster-users at gluster.org
>>         <mailto:Gluster-users at gluster.org>
>>         http://lists.gluster.org/mailman/listinfo/gluster-users 
>>
>>     ------------------------------------------------------------------------
>>
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>     http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> -- 
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171228/4aeed403/attachment.html>


More information about the Gluster-users mailing list