[Gluster-users] Problems joining new gluster 3.10 nodes to existing 3.8

Atin Mukherjee amukherj at redhat.com
Fri Dec 1 05:25:54 UTC 2017


On Fri, Dec 1, 2017 at 1:55 AM, Ziemowit Pierzycki <ziemowit at pierzycki.com>
wrote:

> Hi,
>
> I have a problem joining four Gluster 3.10 nodes to an existing
> Gluster 3.8 nodes.  My understanding that this should work and not be
> too much of a problem.
>
> Peer robe is successful but the node is rejected:
>
> gluster> peer detach elkpinfglt07
> peer detach: success
> gluster> peer probe elkpinfglt07
> peer probe: success.
> gluster> peer status
> Number of Peers: 6
>
> Hostname: elkpinfglt02
> Uuid: 926e9b8a-94ff-4924-b133-a30f2dd48054
> State: Peer in Cluster (Connected)
>
> Hostname: elkpinfglt03
> Uuid: 34d1a409-acc8-41f6-9b11-938317ad3421
> State: Peer in Cluster (Connected)
>
> Hostname: elkpinfglt04
> Uuid: 93255842-e190-4e67-ae8b-917583917855
> State: Peer in Cluster (Connected)
>
> Hostname: elkpinfglt05
> Uuid: 263f8d43-d83e-4465-9de3-e6a285072b02
> State: Peer in Cluster (Connected)
>
> Hostname: elkpinfglt06
> Uuid: aeaa998a-e8e7-405e-bf21-f25de8d82c25
> State: Peer in Cluster (Connected)
>
> Hostname: elkpinfglt07
> Uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3
> State: Peer Rejected (Connected)
>
> The node where I'm probing from complains about not able to find
> information on elkpinfglt07 but then it's found anyway and checksums
> on data0 volume aren't the same:
>
> [2017-11-30 20:12:24.278996] I [MSGID: 106487]
> [glusterd-handler.c:1241:__glusterd_handle_cli_probe] 0-glusterd:
> Received CLI probe req elkpinfglt07 24007
> [2017-11-30 20:12:24.279999] I [MSGID: 106129]
> [glusterd-handler.c:3670:glusterd_probe_begin] 0-glusterd: Unable to
> find peerinfo for host: elkpinfglt07 (24007)
> [2017-11-30 20:12:24.281020] I
> [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
> [2017-11-30 20:12:24.288605] I [MSGID: 106498]
> [glusterd-handler.c:3598:glusterd_friend_add] 0-management: connect
> returned 0
> [2017-11-30 20:12:24.301962] I [MSGID: 106511]
> [glusterd-rpc-ops.c:252:__glusterd_probe_cbk] 0-management: Received
> probe resp from uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3, host:
> elkpinfglt07
> [2017-11-30 20:12:24.301989] I [MSGID: 106511]
> [glusterd-rpc-ops.c:412:__glusterd_probe_cbk] 0-glusterd: Received
> resp to probe req
> [2017-11-30 20:12:25.425294] I [MSGID: 106493]
> [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd:
> Received ACC from uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3, host:
> elkpinfglt07, port: 0
> [2017-11-30 20:12:25.429679] I [MSGID: 106163]
> [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30800
> [2017-11-30 20:12:25.432426] I [MSGID: 106490]
> [glusterd-handler.c:2954:__glusterd_handle_probe_query] 0-glusterd:
> Received probe from uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3
> [2017-11-30 20:12:25.432490] I [MSGID: 106493]
> [glusterd-handler.c:3017:__glusterd_handle_probe_query] 0-glusterd:
> Responded to elkpinfglt07, op_ret: 0, op_errno: 0, ret: 0
> [2017-11-30 20:12:25.436435] I [MSGID: 106490]
> [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid:
> 4baff5cf-6e81-4b2e-b31f-be725b2da4b3
> [2017-11-30 20:12:25.436683] E [MSGID: 106010]
> [glusterd-utils.c:2938:glusterd_compare_friend_volume] 0-management:
> Version of Cksums data0 differ. local cksum = 3011020419, remote cksum
> = 729330920 on peer elkpinfglt07
> [2017-11-30 20:12:25.436716] I [MSGID: 106493]
> [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to elkpinfglt07 (0), ret: 0, op_ret: -1
> [2017-11-30 20:12:31.494646] I [MSGID: 106487]
> [glusterd-handler.c:1474:__glusterd_handle_cli_list_friends]
> 0-glusterd: Received cli list req
> [2017-11-30 20:14:06.174548] I [MSGID: 106487]
> [glusterd-handler.c:1474:__glusterd_handle_cli_list_friends]
> 0-glusterd: Received cli list req
> [2017-11-30 20:14:21.518765] I [MSGID: 106487]
> [glusterd-handler.c:1474:__glusterd_handle_cli_list_friends]
> 0-glusterd: Received cli list req
>
> On the new node the log shows this:
>
> [2017-11-30 20:12:25.196229] I [MSGID: 106163]
> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30800
> [2017-11-30 20:12:25.198228] I [MSGID: 106490]
> [glusterd-handler.c:2957:__glusterd_handle_probe_query] 0-glusterd:
> Received probe from uuid: f614c686-52c9-4d2c-92e2-7ea6cdcfba61
> [2017-11-30 20:12:25.198447] I [MSGID: 106129]
> [glusterd-handler.c:2992:__glusterd_handle_probe_query] 0-glusterd:
> Unable to find peerinfo for host: elkpinfglt01 (24007)
> [2017-11-30 20:12:25.200587] W [MSGID: 106062]
> [glusterd-handler.c:3466:glusterd_transport_inet_options_build]
> 0-glusterd: Failed to get tcp-user-timeout
> [2017-11-30 20:12:25.200649] I
> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
> [2017-11-30 20:12:25.208147] I [MSGID: 106498]
> [glusterd-handler.c:3616:glusterd_friend_add] 0-management: connect
> returned 0
> [2017-11-30 20:12:25.208318] I [MSGID: 106493]
> [glusterd-handler.c:3020:__glusterd_handle_probe_query] 0-glusterd:
> Responded to elkpinfglt01, op_ret: 0, op_errno: 0, ret: 0
> [2017-11-30 20:12:25.209824] I [MSGID: 106490]
> [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid:
> f614c686-52c9-4d2c-92e2-7ea6cdcfba61
> [2017-11-30 20:12:25.325953] I
> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting
> frame-timeout to 600
> [2017-11-30 20:12:25.326055] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
> stopped
> [2017-11-30 20:12:25.326069] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service
> is stopped
> [2017-11-30 20:12:25.327527] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd
> already stopped
> [2017-11-30 20:12:25.327540] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd
> service is stopped
> [2017-11-30 20:12:25.327559] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting
> glustershd service
> [2017-11-30 20:12:26.329457] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad
> already stopped
> [2017-11-30 20:12:26.329558] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad
> service is stopped
> [2017-11-30 20:12:26.329850] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd
> already stopped
> [2017-11-30 20:12:26.329879] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service
> is stopped
> [2017-11-30 20:12:26.330202] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub
> already stopped
> [2017-11-30 20:12:26.330240] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub
> service is stopped
> [2017-11-30 20:12:26.331621] I [MSGID: 106493]
> [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to elkpinfglt01 (0), ret: 0, op_ret: 0
> [2017-11-30 20:12:26.340265] I [MSGID: 106511]
> [glusterd-rpc-ops.c:261:__glusterd_probe_cbk] 0-management: Received
> probe resp from uuid: f614c686-52c9-4d2c-92e2-7ea6cdcfba61, host:
> elkpinfglt01
> [2017-11-30 20:12:26.340331] I [MSGID: 106511]
> [glusterd-rpc-ops.c:421:__glusterd_probe_cbk] 0-glusterd: Received
> resp to probe req
> [2017-11-30 20:12:26.344327] I [MSGID: 106493]
> [glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd:
> Received RJT from uuid: f614c686-52c9-4d2c-92e2-7ea6cdcfba61, host:
> elkpinfglt01, port: 0
>
> Would the checksums cause the peer to be rejected?
>

Yes that's the cause and it means that there is a delta between the info
file of the volume data0 between the node elkpinfglt07 & the node from
where you executed peer probe. Can you please find out the difference of
/var/lib/glusterd/vols/data0/info file between these two nodes?

_______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171201/d677e1a1/attachment.html>


More information about the Gluster-users mailing list