[Gluster-users] Unable to make HA work; mounts hang on remote node reboot
Joe Julian
joe at julianfamily.org
Tue Apr 7 04:22:59 UTC 2015
On 04/06/2015 09:00 PM, Ravishankar N wrote:
>
>
> On 04/07/2015 04:15 AM, CJ Baar wrote:
>> I am hoping someone can give me some direction on this. I have been
>> searching and trying various tweaks all day. I am trying to setup a
>> two-node cluster with a replicated volume. Each node has a brick
>> under /export, and a local mount using glusterfs under /mnt.
>> gluster volume create test1 rep 2 g01.x.local:/exports/sdb1/brick
>> g02.x.local:/exports/sdb1/brick
>> gluster volume start test1
>> mount -t glusterfs g01.x.local:/test1 /mnt/test1
>> When I write a file to one node, it shows up instantly on the other…
>> just as I expect it to. The volume was created as:
>>
>> My problem is that if I reboot one node, the mount on the other
>> completely hangs until the rebooted node comes back up. This seems to
>> defeat the purpose of being highly-available. Is there some setting I
>> am missing? How do I keep the volume on a single node alive during a
>> failure?
>> Any info is appreciated. Thank you.
>
> You can explore the network.ping-timeout setting; try reducing it
> from the default value of 42 seconds.
> -Ravi
That's probably wrong. If you're doing a proper reboot, the services
should be stopped before shutting down, which will do all the proper
handshaking for shutting down a tcp connection. This allows the client
to avoid the ping-timeout. Ping-timeout only comes in to play if there's
a sudden - unexpected communication loss with the server such as power
loss, network partition, etc. Most communication losses should be
transient and recovery is less impactful if you can wait for the
transient issue to resolve.
No, if you're hanging when one server is shut down, then your client
isn't connecting to all the servers as it should. Check your client logs
to figure out why.
More information about the Gluster-users
mailing list