[Gluster-users] Unable to make HA work; mounts hang on remote node reboot

Tue Apr 28 12:17:39 UTC 2015

Yup see I knew someone would come and straighten me out hehe
Thanks Joe.

C
On Apr 28, 2015 6:09 AM, "Joe Julian" <joe at julianfamily.org> wrote:

> No, self-heal daemon is glusterfs (client) with the glustershd vol file.
>
> glusterfsd is the brick server.
>
> Normally the network would stay up through the final process kill as part
> of shutdown. That kill gracefully shuts down the brick process(es) allowing
> the clients to continue without waiting for the tcp connection.
>
> Apparently your init shutdown process disconnects the network. This is
> uncommon and may be considered a bug in whatever K script that's doing it.
>
> On April 28, 2015 12:28:40 AM PDT, Corey Kovacs <corey.kovacs at gmail.com>
> wrote:
>>
>> Someone correct me if i am wrong, but glusterfsd is for self healing as I
>> recall. Its launched when it's needed.
>>
>> On Mon, Apr 27, 2015 at 1:59 PM, CJ Baar <gsml at ffisys.com> wrote:
>>
>>> FYI, I’ve tried with both glusterfs and NFS mounts, and the reaction is
>>> the same. The value of ping.timeout seems to have no effect at all.
>>>
>>> I did discover one thing that makes a difference on reboot. There is a
>>> second service descriptor for “glusterfsd”, which is not enabled by
>>> default, but is started by something else (glusterd, I assume?). However,
>>> whatever it is that starts the process, does not shut it down cleanly
>>> during a reboot… and it appears to be the loss of that process without
>>> de-registration in the peer group that causes the other nodes to hang. If I
>>> enable the service (chkconfig glusterfsd on), it does nothing by default
>>> because the config is commented out (/etc/sysconfig/glusterfsd). But,
>>> having those K scripts in place in rc.d, I can manually touch
>>> /var/lock/subsys/glusterfsd, and then I can successfully reboot one node
>>> without the others hanging. This at least helps when I need to take a node
>>> down for maintenance; it obviously still does nothing for a true node
>>> failure.
>>>
>>> I guess my next step is to figure out to modify the init scripts for
>>> glusterd to touch the other lock file on startup as well. Does not seem a
>>> very elegant solution, but having the lock file in place and the init
>>> scripts enabled seems to solve at least half of the issue.
>>>
>>> —CJ
>>>
>>>
>>>
>>> On Apr 25, 2015, at 11:34 AM, Corey Kovacs <corey.kovacs at gmail.com>
>>> wrote:
>>>
>>> That's not cool..you certainly have a quorum. are you using the fuse
>>> client or regular old nfs?
>>>
>>> C
>>> On Apr 24, 2015 4:50 PM, "CJ Baar" <gsml at ffisys.com> wrote:
>>>
>>>> Corey—
>>>> I was able to get a third node setup. I recreated the volume as
>>>> “replica 3”. The hang still happens (on two nodes, now) when I reboot a
>>>> single node, even though two are still surviving, which should constitute a
>>>> quorum.
>>>> —CJ
>>>>
>>>>
>>>> On Apr 17, 2015, at 6:18 AM, Corey Kovacs <corey.kovacs at gmail.com>
>>>> wrote:
>>>>
>>>> Typically you need to meet a quorum requirement to run just about any
>>>> cluster.  By definition,  two nodes doesn't make a good cluster. A third
>>>> node would let you start with just two since that would allow you to meet
>>>> quorum. Can you add a third node to at least test?
>>>>
>>>> Corey
>>>> On Apr 16, 2015 6:52 PM, "CJ Baar" <gsml at ffisys.com> wrote:
>>>>
>>>>> I appreciate the info. I have tried adjust the ping-timeout setting,
>>>>> and it has seems to have no effect. The whole system hangs for 45+ seconds,
>>>>> which is about what it takes the second node to reboot, no matter what the
>>>>> value of ping-timeout is.  The output of the mnt-log is below.  It shows
>>>>> the adjust value I am currently testing (30s), but the system still hangs
>>>>> for longer than that.
>>>>>
>>>>> Also, I have realized that the problem is deeper than I originally
>>>>> thought.  It’s not just the mount that is hanging when a node reboots… it
>>>>> appears to be the entire system.  I cannot use my SSH connection, no matter
>>>>> where I am in the system, and services such as httpd become unresponsive.
>>>>> I can ping the “surviving” system, but other than that it appears pretty
>>>>> unusable.  This is a major drawback to using gluster.  I can’t afford to
>>>>> lost two entire systems if one dies.
>>>>>
>>>>> [2015-04-16 22:59:21.281365] C
>>>>> [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-common-client-0: server
>>>>> 172.31.64.200:49152 has not responded in the last 30 seconds,
>>>>> disconnecting.
>>>>> [2015-04-16 22:59:21.281560] E [rpc-clnt.c:362:saved_frames_unwind]
>>>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550]
>>>>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787]
>>>>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e]
>>>>> (-->
>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951]
>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] )))))
>>>>> 0-common-client-0: forced unwinding frame type(GlusterFS 3.3)
>>>>> op(LOOKUP(27)) called at 2015-04-16 22:58:45.830962 (xid=0x6d)
>>>>> [2015-04-16 22:59:21.281588] W
>>>>> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0: remote
>>>>> operation failed: Transport endpoint is not connected. Path: /
>>>>> (00000000-0000-0000-0000-000000000001)
>>>>> [2015-04-16 22:59:21.281788] E [rpc-clnt.c:362:saved_frames_unwind]
>>>>> (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550]
>>>>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787]
>>>>> (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e]
>>>>> (-->
>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951]
>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] )))))
>>>>> 0-common-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called
>>>>> at 2015-04-16 22:58:51.277528 (xid=0x6e)
>>>>> [2015-04-16 22:59:21.281806] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk]
>>>>> 0-common-client-0: socket disconnected
>>>>> [2015-04-16 22:59:21.281816] I [client.c:2215:client_rpc_notify]
>>>>> 0-common-client-0: disconnected from common-client-0. Client process will
>>>>> keep trying to connect to glusterd until brick's port is available
>>>>> [2015-04-16 22:59:21.283637] I [socket.c:3292:socket_submit_request]
>>>>> 0-common-client-0: not connected (priv->connected = 0)
>>>>> [2015-04-16 22:59:21.283663] W [rpc-clnt.c:1562:rpc_clnt_submit]
>>>>> 0-common-client-0: failed to submit rpc-request (XID: 0x6f Program:
>>>>> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (common-client-0)
>>>>> [2015-04-16 22:59:21.283674] W
>>>>> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0: remote
>>>>> operation failed: Transport endpoint is not connected. Path: /src
>>>>> (63fc077b-869d-4928-8819-a79cc5c5ffa6)
>>>>> [2015-04-16 22:59:21.284219] W
>>>>> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0: remote
>>>>> operation failed: Transport endpoint is not connected. Path: (null)
>>>>> (00000000-0000-0000-0000-000000000000)
>>>>> [2015-04-16 22:59:52.322952] E
>>>>> [client-handshake.c:1496:client_query_portmap_cbk] 0-common-client-0:
>>>>> failed to get the port number for [root at cfm-c glusterfs]#
>>>>>
>>>>>
>>>>> —CJ
>>>>>
>>>>>
>>>>>
>>>>> On Apr 7, 2015, at 10:26 PM, Ravishankar N <ravishankar at redhat.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 04/07/2015 10:11 PM, CJ Baar wrote:
>>>>>
>>>>> Then, I issue “init 0” on node2, and the mount on node1 becomes
>>>>> unresponsive. This is the log from node1
>>>>> [2015-04-07 16:36:04.250693] W
>>>>> [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx
>>>>> modification failed
>>>>> [2015-04-07 16:36:04.251102] I
>>>>> [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management:
>>>>> Received status volume req for volume test1
>>>>> The message "I [MSGID: 106004]
>>>>> [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
>>>>> 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has
>>>>> disconnected from glusterd." repeated 39 times between [2015-04-07
>>>>> 16:34:40.609878] and [2015-04-07 16:36:37.752489]
>>>>> [2015-04-07 16:36:40.755989] I [MSGID: 106004]
>>>>> [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
>>>>> 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has
>>>>> disconnected from glusterd.
>>>>>
>>>>> This is the glusterd log. Could you also share the mount log of the
>>>>> healthy node in the non-responsive -->responsive time interval?
>>>>> If this is indeed the ping timer issue, you should see something like:
>>>>> "server xxx has not responded in the last 42 seconds, disconnecting."
>>>>> Have you, for testing sake, tried reducing the network.ping-timeout
>>>>> value to something lower and checked that the hang happens only for that
>>>>> time?
>>>>>
>>>>>
>>>>> This does not seem like desired behaviour. I was trying to create this
>>>>> cluster because I was under the impression it would be more resilient than
>>>>> a single-point-of-failure NFS server. However, if the mount halts when one
>>>>> node in the cluster dies, then I’m no better off.
>>>>>
>>>>> I also can’t seem to figure out how to bring a volume online if only
>>>>> one node in the cluster is running; again, not really functioning as HA.
>>>>> The gluster service runs and the volume “starts”, but it is not “online” or
>>>>> mountable until both nodes are running. In a situation where a node fails
>>>>> and we need storage online before we can troubleshoot the cause of the node
>>>>> failure, how do I get a volume to go online?
>>>>>
>>>>> This is expected behavior. In a two node cluster, if only one is
>>>>> powered on, glusterd will not start other gluster processes (brick, nfs,
>>>>> shd ) until the glusterd of the other node is also up (i.e. quorum is met).
>>>>> If you want to override this behavior, do a `gluster vol start <volname>
>>>>> force` on the node that is up.
>>>>>
>>>>> -Ravi
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>
>> ------------------------------
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150428/786f3e98/attachment.html>