[Gluster-users] Problems joining new gluster 3.10 nodes to existing 3.8

Ziemowit Pierzycki ziemowit at pierzycki.com
Wed Dec 6 19:01:37 UTC 2017


The changes between the configuration files are significant!  It
appears the configuration has been re-written for 3.10.  In addition,
I noticed that there are a lot of .rpmsave files on the 3.8 nodes.
This is most likely from the upgrades done on the 3.8 nodes in the
past.  I pretty much gave up on making 3.8 work with 3.10.  Instead,
I'll use 3.8 on the new nodes and eventually upgrade to 3.10 across
the whole cluster using the upgrade procedure... hopefully it won't
suffer from the same issues.

On Thu, Nov 30, 2017 at 11:25 PM, Atin Mukherjee <amukherj at redhat.com> wrote:
>
>
> On Fri, Dec 1, 2017 at 1:55 AM, Ziemowit Pierzycki <ziemowit at pierzycki.com>
> wrote:
>>
>> Hi,
>>
>> I have a problem joining four Gluster 3.10 nodes to an existing
>> Gluster 3.8 nodes.  My understanding that this should work and not be
>> too much of a problem.
>>
>> Peer robe is successful but the node is rejected:
>>
>> gluster> peer detach elkpinfglt07
>> peer detach: success
>> gluster> peer probe elkpinfglt07
>> peer probe: success.
>> gluster> peer status
>> Number of Peers: 6
>>
>> Hostname: elkpinfglt02
>> Uuid: 926e9b8a-94ff-4924-b133-a30f2dd48054
>> State: Peer in Cluster (Connected)
>>
>> Hostname: elkpinfglt03
>> Uuid: 34d1a409-acc8-41f6-9b11-938317ad3421
>> State: Peer in Cluster (Connected)
>>
>> Hostname: elkpinfglt04
>> Uuid: 93255842-e190-4e67-ae8b-917583917855
>> State: Peer in Cluster (Connected)
>>
>> Hostname: elkpinfglt05
>> Uuid: 263f8d43-d83e-4465-9de3-e6a285072b02
>> State: Peer in Cluster (Connected)
>>
>> Hostname: elkpinfglt06
>> Uuid: aeaa998a-e8e7-405e-bf21-f25de8d82c25
>> State: Peer in Cluster (Connected)
>>
>> Hostname: elkpinfglt07
>> Uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3
>> State: Peer Rejected (Connected)
>>
>> The node where I'm probing from complains about not able to find
>> information on elkpinfglt07 but then it's found anyway and checksums
>> on data0 volume aren't the same:
>>
>> [2017-11-30 20:12:24.278996] I [MSGID: 106487]
>> [glusterd-handler.c:1241:__glusterd_handle_cli_probe] 0-glusterd:
>> Received CLI probe req elkpinfglt07 24007
>> [2017-11-30 20:12:24.279999] I [MSGID: 106129]
>> [glusterd-handler.c:3670:glusterd_probe_begin] 0-glusterd: Unable to
>> find peerinfo for host: elkpinfglt07 (24007)
>> [2017-11-30 20:12:24.281020] I
>> [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting
>> frame-timeout to 600
>> [2017-11-30 20:12:24.288605] I [MSGID: 106498]
>> [glusterd-handler.c:3598:glusterd_friend_add] 0-management: connect
>> returned 0
>> [2017-11-30 20:12:24.301962] I [MSGID: 106511]
>> [glusterd-rpc-ops.c:252:__glusterd_probe_cbk] 0-management: Received
>> probe resp from uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3, host:
>> elkpinfglt07
>> [2017-11-30 20:12:24.301989] I [MSGID: 106511]
>> [glusterd-rpc-ops.c:412:__glusterd_probe_cbk] 0-glusterd: Received
>> resp to probe req
>> [2017-11-30 20:12:25.425294] I [MSGID: 106493]
>> [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd:
>> Received ACC from uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3, host:
>> elkpinfglt07, port: 0
>> [2017-11-30 20:12:25.429679] I [MSGID: 106163]
>> [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 30800
>> [2017-11-30 20:12:25.432426] I [MSGID: 106490]
>> [glusterd-handler.c:2954:__glusterd_handle_probe_query] 0-glusterd:
>> Received probe from uuid: 4baff5cf-6e81-4b2e-b31f-be725b2da4b3
>> [2017-11-30 20:12:25.432490] I [MSGID: 106493]
>> [glusterd-handler.c:3017:__glusterd_handle_probe_query] 0-glusterd:
>> Responded to elkpinfglt07, op_ret: 0, op_errno: 0, ret: 0
>> [2017-11-30 20:12:25.436435] I [MSGID: 106490]
>> [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req]
>> 0-glusterd: Received probe from uuid:
>> 4baff5cf-6e81-4b2e-b31f-be725b2da4b3
>> [2017-11-30 20:12:25.436683] E [MSGID: 106010]
>> [glusterd-utils.c:2938:glusterd_compare_friend_volume] 0-management:
>> Version of Cksums data0 differ. local cksum = 3011020419, remote cksum
>> = 729330920 on peer elkpinfglt07
>> [2017-11-30 20:12:25.436716] I [MSGID: 106493]
>> [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
>> Responded to elkpinfglt07 (0), ret: 0, op_ret: -1
>> [2017-11-30 20:12:31.494646] I [MSGID: 106487]
>> [glusterd-handler.c:1474:__glusterd_handle_cli_list_friends]
>> 0-glusterd: Received cli list req
>> [2017-11-30 20:14:06.174548] I [MSGID: 106487]
>> [glusterd-handler.c:1474:__glusterd_handle_cli_list_friends]
>> 0-glusterd: Received cli list req
>> [2017-11-30 20:14:21.518765] I [MSGID: 106487]
>> [glusterd-handler.c:1474:__glusterd_handle_cli_list_friends]
>> 0-glusterd: Received cli list req
>>
>> On the new node the log shows this:
>>
>> [2017-11-30 20:12:25.196229] I [MSGID: 106163]
>> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 30800
>> [2017-11-30 20:12:25.198228] I [MSGID: 106490]
>> [glusterd-handler.c:2957:__glusterd_handle_probe_query] 0-glusterd:
>> Received probe from uuid: f614c686-52c9-4d2c-92e2-7ea6cdcfba61
>> [2017-11-30 20:12:25.198447] I [MSGID: 106129]
>> [glusterd-handler.c:2992:__glusterd_handle_probe_query] 0-glusterd:
>> Unable to find peerinfo for host: elkpinfglt01 (24007)
>> [2017-11-30 20:12:25.200587] W [MSGID: 106062]
>> [glusterd-handler.c:3466:glusterd_transport_inet_options_build]
>> 0-glusterd: Failed to get tcp-user-timeout
>> [2017-11-30 20:12:25.200649] I
>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
>> frame-timeout to 600
>> [2017-11-30 20:12:25.208147] I [MSGID: 106498]
>> [glusterd-handler.c:3616:glusterd_friend_add] 0-management: connect
>> returned 0
>> [2017-11-30 20:12:25.208318] I [MSGID: 106493]
>> [glusterd-handler.c:3020:__glusterd_handle_probe_query] 0-glusterd:
>> Responded to elkpinfglt01, op_ret: 0, op_errno: 0, ret: 0
>> [2017-11-30 20:12:25.209824] I [MSGID: 106490]
>> [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req]
>> 0-glusterd: Received probe from uuid:
>> f614c686-52c9-4d2c-92e2-7ea6cdcfba61
>> [2017-11-30 20:12:25.325953] I
>> [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting
>> frame-timeout to 600
>> [2017-11-30 20:12:25.326055] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
>> stopped
>> [2017-11-30 20:12:25.326069] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service
>> is stopped
>> [2017-11-30 20:12:25.327527] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd
>> already stopped
>> [2017-11-30 20:12:25.327540] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd
>> service is stopped
>> [2017-11-30 20:12:25.327559] I [MSGID: 106567]
>> [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting
>> glustershd service
>> [2017-11-30 20:12:26.329457] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad
>> already stopped
>> [2017-11-30 20:12:26.329558] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: quotad
>> service is stopped
>> [2017-11-30 20:12:26.329850] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd
>> already stopped
>> [2017-11-30 20:12:26.329879] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service
>> is stopped
>> [2017-11-30 20:12:26.330202] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub
>> already stopped
>> [2017-11-30 20:12:26.330240] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub
>> service is stopped
>> [2017-11-30 20:12:26.331621] I [MSGID: 106493]
>> [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd:
>> Responded to elkpinfglt01 (0), ret: 0, op_ret: 0
>> [2017-11-30 20:12:26.340265] I [MSGID: 106511]
>> [glusterd-rpc-ops.c:261:__glusterd_probe_cbk] 0-management: Received
>> probe resp from uuid: f614c686-52c9-4d2c-92e2-7ea6cdcfba61, host:
>> elkpinfglt01
>> [2017-11-30 20:12:26.340331] I [MSGID: 106511]
>> [glusterd-rpc-ops.c:421:__glusterd_probe_cbk] 0-glusterd: Received
>> resp to probe req
>> [2017-11-30 20:12:26.344327] I [MSGID: 106493]
>> [glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd:
>> Received RJT from uuid: f614c686-52c9-4d2c-92e2-7ea6cdcfba61, host:
>> elkpinfglt01, port: 0
>>
>> Would the checksums cause the peer to be rejected?
>
>
> Yes that's the cause and it means that there is a delta between the info
> file of the volume data0 between the node elkpinfglt07 & the node from where
> you executed peer probe. Can you please find out the difference of
> /var/lib/glusterd/vols/data0/info file between these two nodes?
>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>


More information about the Gluster-users mailing list