[Gluster-users] Unable to make HA work; mounts hang on remote node reboot

Tue Apr 7 04:22:59 UTC 2015

On 04/06/2015 09:00 PM, Ravishankar N wrote:
>
>
> On 04/07/2015 04:15 AM, CJ Baar wrote:
>> I am hoping someone can give me some direction on this. I have been 
>> searching and trying various tweaks all day. I am trying to setup a 
>> two-node cluster with a replicated volume. Each node has a brick 
>> under /export, and a local mount using glusterfs under /mnt.
>>     gluster volume create test1 rep 2 g01.x.local:/exports/sdb1/brick 
>> g02.x.local:/exports/sdb1/brick
>>     gluster volume start test1
>>     mount -t glusterfs g01.x.local:/test1 /mnt/test1
>> When I write a file to one node, it shows up instantly on the other… 
>> just as I expect it to. The volume was created as:
>>
>> My problem is that if I reboot one node, the mount on the other 
>> completely hangs until the rebooted node comes back up. This seems to 
>> defeat the purpose of being highly-available. Is there some setting I 
>> am missing? How do I keep the volume on a single node alive during a 
>> failure?
>> Any info is appreciated. Thank you.
>
> You can explore the  network.ping-timeout setting; try reducing it 
> from the default value of 42 seconds.
> -Ravi
That's probably wrong. If you're doing a proper reboot, the services 
should be stopped before shutting down, which will do all the proper 
handshaking for shutting down a tcp connection. This allows the client 
to avoid the ping-timeout. Ping-timeout only comes in to play if there's 
a sudden - unexpected communication loss with the server such as power 
loss, network partition, etc. Most communication losses should be 
transient and recovery is less impactful if you can wait for the 
transient issue to resolve.

No, if you're hanging when one server is shut down, then your client 
isn't connecting to all the servers as it should. Check your client logs 
to figure out why.