[Gluster-users] Unable to make HA work; mounts hang on remote node reboot

Tue Apr 7 16:41:52 UTC 2015

> On Apr 6, 2015, at 10:22 PM, Joe Julian <joe at julianfamily.org> wrote:
> 
> On 04/06/2015 09:00 PM, Ravishankar N wrote:
>> 
>> 
>> On 04/07/2015 04:15 AM, CJ Baar wrote:
>>> I am hoping someone can give me some direction on this. I have been searching and trying various tweaks all day. I am trying to setup a two-node cluster with a replicated volume. Each node has a brick under /export, and a local mount using glusterfs under /mnt.
>>>    gluster volume create test1 rep 2 g01.x.local:/exports/sdb1/brick g02.x.local:/exports/sdb1/brick
>>>    gluster volume start test1
>>>    mount -t glusterfs g01.x.local:/test1 /mnt/test1
>>> When I write a file to one node, it shows up instantly on the other… just as I expect it to. The volume was created as:
>>> 
>>> My problem is that if I reboot one node, the mount on the other completely hangs until the rebooted node comes back up. This seems to defeat the purpose of being highly-available. Is there some setting I am missing? How do I keep the volume on a single node alive during a failure?
>>> Any info is appreciated. Thank you.
>> 
>> You can explore the  network.ping-timeout setting; try reducing it from the default value of 42 seconds.
>> -Ravi
> That's probably wrong. If you're doing a proper reboot, the services should be stopped before shutting down, which will do all the proper handshaking for shutting down a tcp connection. This allows the client to avoid the ping-timeout. Ping-timeout only comes in to play if there's a sudden - unexpected communication loss with the server such as power loss, network partition, etc. Most communication losses should be transient and recovery is less impactful if you can wait for the transient issue to resolve.
> 
> No, if you're hanging when one server is shut down, then your client isn't connecting to all the servers as it should. Check your client logs to figure out why.

The logs, as I interpret them, show both bricks successfully being connected when I do a mount.  (mount -t glusterfs g01.x.local:/test1 /mnt/test1)  It even claims to be setting the read preference to the correct local brick.

[2015-04-07 16:13:05.581085] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-test1-client-0: changing port to 49152 (from 0)
[2015-04-07 16:13:05.583826] I [client-handshake.c:1413:select_server_supported_programs] 0-test1-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-04-07 16:13:05.584017] I [client-handshake.c:1200:client_setvolume_cbk] 0-test1-client-0: Connected to test1-client-0, attached to remote volume '/exports/sdb1/brick'.
[2015-04-07 16:13:05.584030] I [client-handshake.c:1210:client_setvolume_cbk] 0-test1-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2015-04-07 16:13:05.584122] I [MSGID: 108005] [afr-common.c:3552:afr_notify] 0-test1-replicate-0: Subvolume 'test1-client-0' came back up; going online.
[2015-04-07 16:13:05.584146] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test1-client-0: Server lk version = 1
[2015-04-07 16:13:05.585647] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-test1-client-1: changing port to 49152 (from 0)
[2015-04-07 16:13:05.590017] I [client-handshake.c:1413:select_server_supported_programs] 0-test1-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-04-07 16:13:05.591067] I [client-handshake.c:1200:client_setvolume_cbk] 0-test1-client-1: Connected to test1-client-1, attached to remote volume '/exports/sdb1/brick'.
[2015-04-07 16:13:05.591079] I [client-handshake.c:1210:client_setvolume_cbk] 0-test1-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2015-04-07 16:13:05.595077] I [fuse-bridge.c:5080:fuse_graph_setup] 0-fuse: switched to graph 0
[2015-04-07 16:13:05.595144] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test1-client-1: Server lk version = 1
[2015-04-07 16:13:05.595265] I [fuse-bridge.c:4009:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22
[2015-04-07 16:13:05.596883] I [afr-common.c:1484:afr_local_discovery_cbk] 0-test1-replicate-0: selecting local read_child test1-client-0

This is all the log I get on node1 when I drop node2. It takes almost two minutes for node1 to resume.

[2015-04-07 16:20:48.278742] W [socket.c:611:__socket_rwv] 0-management: readv on 172.32.65.241:24007 failed (No data available)
[2015-04-07 16:20:48.278837] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd.
[2015-04-07 16:20:48.279062] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f736ad56550] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f735fdf1df8] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f735fd662c2] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f735fd51a80] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7f736ab2bf63] ))))) 0-management: Lock for vol test1 not held
[2015-04-07 16:22:24.766177] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed
[2015-04-07 16:22:24.766587] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test1

If I try a “graceful” shutdown by manually stopping the glusterd services, the mount stays up and works… until the node itself is shutdown.  This is the log from node1 after issuing “service glusterd stop” on node2.

[2015-04-07 16:32:57.224545] W [socket.c:611:__socket_rwv] 0-management: readv on 172.32.65.241:24007 failed (No data available)
[2015-04-07 16:32:57.224612] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd.
[2015-04-07 16:32:57.224829] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f736ad56550] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f735fdf1df8] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f735fd662c2] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f735fd51a80] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7f736ab2bf63] ))))) 0-management: Lock for vol test1 not held
[2015-04-07 16:33:03.506088] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed
[2015-04-07 16:33:03.506619] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test1
[2015-04-07 16:33:08.498391] E [socket.c:2267:socket_connect_finish] 0-management: connection to 172.32.65.241:24007 failed (Connection refused)

At this point, the mount on node1 is still responsive, even though gluster itself is down on node2, and confirmed by a status output.
Status of volume: test1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick g01.x.local:/exports/sdb1/brick			49152	Y	22739
NFS Server on localhost					2049	Y	22746
Self-heal Daemon on localhost				N/A	Y	22751

Task Status of Volume test1
------------------------------------------------------------------------------
There are no active volume tasks

Then, I issue “init 0” on node2, and the mount on node1 becomes unresponsive. This is the log from node1
[2015-04-07 16:36:04.250693] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed
[2015-04-07 16:36:04.251102] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test1
The message "I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd." repeated 39 times between [2015-04-07 16:34:40.609878] and [2015-04-07 16:36:37.752489]
[2015-04-07 16:36:40.755989] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd.

This does not seem like desired behaviour. I was trying to create this cluster because I was under the impression it would be more resilient than a single-point-of-failure NFS server. However, if the mount halts when one node in the cluster dies, then I’m no better off.

I also can’t seem to figure out how to bring a volume online if only one node in the cluster is running; again, not really functioning as HA. The gluster service runs and the volume “starts”, but it is not “online” or mountable until both nodes are running. In a situation where a node fails and we need storage online before we can troubleshoot the cause of the node failure, how do I get a volume to go online?

Thanks.