[Gluster-users] peer rejected but connected
lejeczek
peljasz at yahoo.co.uk
Fri Sep 1 07:50:28 UTC 2017
hi, still tricky
whether I do or do not remove "tier-enabled=0" on rejected
peer, and try to restart glusterd service there, restart fails:
lusterd version 3.10.5 (args: /usr/sbin/glusterd -p
/var/run/glusterd.pid --log-level INFO)
[2017-09-01 07:41:08.251314] I [MSGID: 106478]
[glusterd.c:1449:init] 0-management: Maximum allowed open
file descriptors set to 65536
[2017-09-01 07:41:08.251400] I [MSGID: 106479]
[glusterd.c:1496:init] 0-management: Using /var/lib/glusterd
as working directory
[2017-09-01 07:41:08.275000] W [MSGID: 103071]
[rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
rdma_cm event channel creation failed [No such device]
[2017-09-01 07:41:08.275071] W [MSGID: 103055]
[rdma.c:4897:init] 0-rdma.management: Failed to initialize
IB Device
[2017-09-01 07:41:08.275096] W
[rpc-transport.c:350:rpc_transport_load] 0-rpc-transport:
'rdma' initialization failed
[2017-09-01 07:41:08.275307] W
[rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot
create listener, initing the transport failed
[2017-09-01 07:41:08.275343] E [MSGID: 106243]
[glusterd.c:1720:init] 0-management: creation of 1 listeners
failed, continuing with succeeded transport
[2017-09-01 07:41:13.941020] I [MSGID: 106513]
[glusterd-store.c:2197:glusterd_restore_op_version]
0-glusterd: retrieved op-version: 30712
[2017-09-01 07:41:14.109192] I [MSGID: 106498]
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo]
0-management: connect returned 0
[2017-09-01 07:41:14.109364] W [MSGID: 106062]
[glusterd-handler.c:3466:glusterd_transport_inet_options_build]
0-glusterd: Failed to get tcp-user-timeout
[2017-09-01 07:41:14.109481] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management:
setting frame-timeout to 600
[2017-09-01 07:41:14.134691] E [MSGID: 106187]
[glusterd-store.c:4559:glusterd_resolve_all_bricks]
0-glusterd: resolve brick failed in restore
[2017-09-01 07:41:14.134769] E [MSGID: 101019]
[xlator.c:503:xlator_init] 0-management: Initialization of
volume 'management' failed, review your volfile again
[2017-09-01 07:41:14.134790] E [MSGID: 101066]
[graph.c:325:glusterfs_graph_init] 0-management:
initializing translator failed
[2017-09-01 07:41:14.134804] E [MSGID: 101176]
[graph.c:681:glusterfs_graph_activate] 0-graph: init failed
[2017-09-01 07:41:14.135723] W
[glusterfsd.c:1332:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd)
[0x55f22fab3abd]
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1)
[0x55f22fab3961]
-->/usr/sbin/glusterd(cleanup_and_exit+0x6b)
[0x55f22fab2e4b] ) 0-: received signum (1), shutting down
I have to wipe clean /var/lib/glusterd on
rejected(10.5.6.17) peer and then can restart it, but.. I
probe it anew and then "tier-enabled=0" lands in the "info"
file for each vol on 10.5.6.17 and... vicious circle?
On 01/09/17 07:30, Gaurav Yadav wrote:
> Logs from newly added node helped me in RCA of the issue.
>
> Info file on node 10.5.6.17 consist of an additional
> property "tier-enabled" which is not present in info file
> from other 3 nodes, hence
> when gluster peer probe call is made, in order to maintain
> consistency across the cluster cksum is compared. In this
> case as both files are different leading to different
> cksum, causing state in "State: Peer Rejected (Connected)".
>
> This inconsistency arise due to upgrade you did.
>
> Workaround:
> 1.Go to node 10.5.6.17
> 2.Open info file from
> "/var/lib/glusterd/vols/<vol-name>/info" and remove
> "tier-enabled=0".
> 3.Restart glusterd services
> 4.Peer probe again.
>
> Thanks
> Gaurav
>
> On Thu, Aug 31, 2017 at 3:37 PM, lejeczek
> <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>> wrote:
>
> attached the lot as per your request.
>
> Would bee really great if you can find the root cause
> of this and suggest a resolution. Fingers crossed.
> thanks, L.
>
> On 31/08/17 05:34, Gaurav Yadav wrote:
>
> Could you please sendentire content of
> "/var/lib/glusterd/" directory of the 4th node
> which is being peer probed, along with
> command-history and glusterd.logs.
>
> Thanks
> Gaurav
>
> On Wed, Aug 30, 2017 at 7:10 PM, lejeczek
> <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>
> <mailto:peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk>>> wrote:
>
>
>
> On 30/08/17 07:18, Gaurav Yadav wrote:
>
>
> Could you please send me "info" file which is
> placed in "/var/lib/glusterd/vols/<vol-name>"
> directory from all the nodes along with
> glusterd.logs and command-history.
>
> Thanks
> Gaurav
>
> On Tue, Aug 29, 2017 at 7:13 PM, lejeczek
> <peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk>
> <mailto:peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk>>
> <mailto:peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk>
>
> <mailto:peljasz at yahoo.co.uk
> <mailto:peljasz at yahoo.co.uk>>>> wrote:
>
> hi fellas,
> same old same
> in log of the probing peer I see:
> ...
> 2017-08-29 13:36:16.882196] I [MSGID:
> 106493]
>
>
> [glusterd-handler.c:3020:__glusterd_handle_probe_query]
> 0-glusterd: Responded to
> priv.xx.xx.priv.xx.xx.x,
> op_ret: 0, op_errno: 0, ret: 0
> [2017-08-29 13:36:16.904961] I [MSGID:
> 106490]
>
>
> [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid:
> 2a17edb4-ae68-4b67-916e-e38a2087ca28
> [2017-08-29 13:36:16.906477] E [MSGID:
> 106010]
>
>
> [glusterd-utils.c:3034:glusterd_compare_friend_volume]
> 0-management: Version of Cksums CO-DATA
> differ. local
> cksum = 4088157353, remote cksum =
> 2870780063
> on peer
> 10.5.6.17
> [2017-08-29 13:36:16.907187] I [MSGID:
> 106493]
>
>
> [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp]
> 0-glusterd: Responded to 10.5.6.17
> (0), ret:
> 0, op_ret: -1
> ...
>
> Why would adding a new peer make
> cluster jump
> to check
> checksums on a vol on that newly added
> peer?
>
>
> really. I mean, no brick even exists on newly
> added
> peer, it's just been probed, why this?:
>
> [2017-08-30 13:17:51.949430] E [MSGID: 106010]
>
> [glusterd-utils.c:3034:glusterd_compare_friend_volume]
> 0-management: Version of Cksums CO-DATA
> differ. local
> cksum = 4088157353, remote cksum = 2870780063
> on peer
> 10.5.6.17
>
> 10.5.6.17 is a candidate I'm probing from a
> working
> cluster.
> Why gluster wants checksums and why checksums
> would be
> different?
> Would anybody know what is going on there?
>
>
> Is it why the peer gets rejected?
> That peer I'm hoping to add, was a
> member of the
> cluster in the past but I did "usual"
> wipe of
> /var/lib/gluster on candidate peer.
>
> a hint, solution would be great to hear.
> L.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> <http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>
> <http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> <http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> <http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>
>
>
More information about the Gluster-users
mailing list