[Gluster-users] Volume ping-timeout parameter and client side mount timeouts

Mon Nov 14 15:43:57 UTC 2016

Hello Gluster Community

We have 2x brick nodes running with replication for a volume gv0 for which set a
"gluster volume set gv0 ping-timeout 20".

In our tests it seemed there is unknown delay with this ping-timeout - we see it
timing out much later after about 35 seconds and not at around 20 seconds (see
test below).

Our distributed database cluster is using Gluster as a secondary file system for
backups etc. - it's Pacemaker cluster manager needs to know how long to wait
before giving up on the glusterfs mounted file system to become available again
or when to failover to another node.

1. When do we know when to give up waiting on the glusterfs mount point to
become accessible again following an outage on the brick server this client was
connected to ?
2. Is there a timeout / interval setting on the client side that we could
reduce, so that it more quickly tries to switch the mount point to a different,
available brick server ?

Regards,
Martin Schlegel

__________

Here is how we tested this:

As a test we blocked the entire network on one of these brick nodes:
root at glusterfs-brick-node1 $ date;iptables -A INPUT -i bond0 -j DROP ; iptables
-A OUTPUT -o bond0 -j DROP
Mon Nov 14 08:26:55 UTC 2016

>From the syslog on the glusterfs-client-node
Nov 14 08:27:30 glusterfs-client-node1 pgshared1[26783]: [2016-11-14
08:27:30.275694] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired]
0-gv0-client-0: server glusterfs-brick-node1:49152 has not responded in the last
20 seconds, disconnecting.

<--- This last message "has not responded in the last 20 seconds" is confusing
to me, because the brick node was clearly blocked for 35 seconds already ! Is
there some client-side check interval that can be reduced ?