[Gluster-users] peer rejected but connected

Fri Sep 1 07:50:28 UTC 2017

hi, still tricky

whether I do or do not remove "tier-enabled=0" on rejected 
peer, and try to restart glusterd service there, restart fails:

lusterd version 3.10.5 (args: /usr/sbin/glusterd -p 
/var/run/glusterd.pid --log-level INFO)
[2017-09-01 07:41:08.251314] I [MSGID: 106478] 
[glusterd.c:1449:init] 0-management: Maximum allowed open 
file descriptors set to 65536
[2017-09-01 07:41:08.251400] I [MSGID: 106479] 
[glusterd.c:1496:init] 0-management: Using /var/lib/glusterd 
as working directory
[2017-09-01 07:41:08.275000] W [MSGID: 103071] 
[rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma: 
rdma_cm event channel creation failed [No such device]
[2017-09-01 07:41:08.275071] W [MSGID: 103055] 
[rdma.c:4897:init] 0-rdma.management: Failed to initialize 
IB Device
[2017-09-01 07:41:08.275096] W 
[rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 
'rdma' initialization failed
[2017-09-01 07:41:08.275307] W 
[rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot 
create listener, initing the transport failed
[2017-09-01 07:41:08.275343] E [MSGID: 106243] 
[glusterd.c:1720:init] 0-management: creation of 1 listeners 
failed, continuing with succeeded transport
[2017-09-01 07:41:13.941020] I [MSGID: 106513] 
[glusterd-store.c:2197:glusterd_restore_op_version] 
0-glusterd: retrieved op-version: 30712
[2017-09-01 07:41:14.109192] I [MSGID: 106498] 
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 
0-management: connect returned 0
[2017-09-01 07:41:14.109364] W [MSGID: 106062] 
[glusterd-handler.c:3466:glusterd_transport_inet_options_build] 
0-glusterd: Failed to get tcp-user-timeout
[2017-09-01 07:41:14.109481] I 
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: 
setting frame-timeout to 600
[2017-09-01 07:41:14.134691] E [MSGID: 106187] 
[glusterd-store.c:4559:glusterd_resolve_all_bricks] 
0-glusterd: resolve brick failed in restore
[2017-09-01 07:41:14.134769] E [MSGID: 101019] 
[xlator.c:503:xlator_init] 0-management: Initialization of 
volume 'management' failed, review your volfile again
[2017-09-01 07:41:14.134790] E [MSGID: 101066] 
[graph.c:325:glusterfs_graph_init] 0-management: 
initializing translator failed
[2017-09-01 07:41:14.134804] E [MSGID: 101176] 
[graph.c:681:glusterfs_graph_activate] 0-graph: init failed
[2017-09-01 07:41:14.135723] W 
[glusterfsd.c:1332:cleanup_and_exit] 
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) 
[0x55f22fab3abd] 
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1) 
[0x55f22fab3961] 
-->/usr/sbin/glusterd(cleanup_and_exit+0x6b) 
[0x55f22fab2e4b] ) 0-: received signum (1), shutting down

I have to wipe clean /var/lib/glusterd on 
rejected(10.5.6.17) peer and then can restart it, but.. I 
probe it anew and then "tier-enabled=0" lands in the "info" 
file for each vol on 10.5.6.17 and... vicious circle?

On 01/09/17 07:30, Gaurav Yadav wrote:
> Logs from newly added node helped me in RCA of the issue.
>
> Info file on node 10.5.6.17 consist of an additional 
> property "tier-enabled" which is not present in info file 
> from other 3 nodes, hence
> when gluster peer probe call is made, in order to maintain 
> consistency across the cluster cksum is compared. In this
> case as both files are different leading to different 
> cksum, causing state in  "State: Peer Rejected (Connected)".
>
> This inconsistency arise due to upgrade you did.
>
> Workaround:
> 1.Go to node 10.5.6.17
> 2.Open info file from 
> "/var/lib/glusterd/vols/<vol-name>/info" and remove 
> "tier-enabled=0".
> 3.Restart glusterd services
> 4.Peer probe again.
>
> Thanks
> Gaurav
>
> On Thu, Aug 31, 2017 at 3:37 PM, lejeczek 
> <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>> wrote:
>
>     attached the lot as per your request.
>
>     Would bee really great if you can find the root cause
>     of this and suggest a resolution. Fingers crossed.
>     thanks, L.
>
>     On 31/08/17 05:34, Gaurav Yadav wrote:
>
>         Could you please sendentire content of
>         "/var/lib/glusterd/" directory of the 4th node
>         which is being peer probed, along with
>         command-history and glusterd.logs.
>
>         Thanks
>         Gaurav
>
>         On Wed, Aug 30, 2017 at 7:10 PM, lejeczek
>         <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>
>         <mailto:peljasz at yahoo.co.uk
>         <mailto:peljasz at yahoo.co.uk>>> wrote:
>
>
>
>             On 30/08/17 07:18, Gaurav Yadav wrote:
>
>
>                 Could you please send me "info" file which is
>                 placed in "/var/lib/glusterd/vols/<vol-name>"
>                 directory from all the nodes along with
>                 glusterd.logs and command-history.
>
>                 Thanks
>                 Gaurav
>
>                 On Tue, Aug 29, 2017 at 7:13 PM, lejeczek
>                 <peljasz at yahoo.co.uk
>         <mailto:peljasz at yahoo.co.uk>
>         <mailto:peljasz at yahoo.co.uk
>         <mailto:peljasz at yahoo.co.uk>>
>                 <mailto:peljasz at yahoo.co.uk
>         <mailto:peljasz at yahoo.co.uk>
>
>                 <mailto:peljasz at yahoo.co.uk
>         <mailto:peljasz at yahoo.co.uk>>>> wrote:
>
>                     hi fellas,
>                     same old same
>                     in log of the probing peer I see:
>                     ...
>                     2017-08-29 13:36:16.882196] I [MSGID:
>         106493]
>
>                
>         [glusterd-handler.c:3020:__glusterd_handle_probe_query]
>                     0-glusterd: Responded to
>         priv.xx.xx.priv.xx.xx.x,
>                     op_ret: 0, op_errno: 0, ret: 0
>                     [2017-08-29 13:36:16.904961] I [MSGID:
>         106490]
>
>                
>         [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req]
>                     0-glusterd: Received probe from uuid:
>                     2a17edb4-ae68-4b67-916e-e38a2087ca28
>                     [2017-08-29 13:36:16.906477] E [MSGID:
>         106010]
>
>                
>         [glusterd-utils.c:3034:glusterd_compare_friend_volume]
>                     0-management: Version of Cksums CO-DATA
>                 differ. local
>                     cksum = 4088157353, remote cksum =
>         2870780063
>                 on peer
>                     10.5.6.17
>                     [2017-08-29 13:36:16.907187] I [MSGID:
>         106493]
>
>                
>         [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp]
>                     0-glusterd: Responded to 10.5.6.17
>         (0), ret:
>                 0, op_ret: -1
>                     ...
>
>                     Why would adding a new peer make
>         cluster jump
>                 to check
>                     checksums on a vol on that newly added
>         peer?
>
>
>             really. I mean, no brick even exists on newly
>         added
>             peer, it's just been probed, why this?:
>
>             [2017-08-30 13:17:51.949430] E [MSGID: 106010]
>            
>         [glusterd-utils.c:3034:glusterd_compare_friend_volume]
>             0-management: Version of Cksums CO-DATA
>         differ. local
>             cksum = 4088157353, remote cksum = 2870780063
>         on peer
>             10.5.6.17
>
>             10.5.6.17 is a candidate I'm probing from a
>         working
>             cluster.
>             Why gluster wants checksums and why checksums
>         would be
>             different?
>             Would anybody know what is going on there?
>
>
>                     Is it why the peer gets rejected?
>                     That peer I'm hoping to add, was a
>         member of the
>                     cluster in the past but I did "usual"
>         wipe of
>                     /var/lib/gluster on candidate peer.
>
>                     a hint, solution would be great to hear.
>                     L.
>                    
>         _______________________________________________
>                     Gluster-users mailing list
>         Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>
>                 <mailto:Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>>
>                     <mailto:Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>
>                 <mailto:Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>>>
>         http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>                
>         <http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>                
>         <http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>                
>         <http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>>>
>
>
>
>             _______________________________________________
>             Gluster-users mailing list
>         Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>
>             <mailto:Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>>
>         http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>            
>         <http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>
>
>