[Gluster-users] Can not add peer
Atin Mukherjee
amukherj at redhat.com
Sun May 8 06:51:14 UTC 2016
On 05/07/2016 01:39 AM, Muminov, Azamat wrote:
> Hi,
>
>
>
> I have a ~50 node cluster. I configured gluster so that there are 2
> volumes: One is configured on top of HDD, and the other one is
> configured on top of RAM.
>
>
>
> [root at nmIDPP20 ~]# gluster volume info
>
> Volume Name: ram
>
> Type: Distributed-Replicate
>
> Volume ID: a97fa262-276b-41e9-8f59-40f28451f689
>
> Status: Started
>
> Number of Bricks: 5 x 2 = 10
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: 10.238.0.15:/mnt/ram/data
>
> Brick2: 10.238.0.16:/mnt/ram/data
>
> Brick3: 10.238.0.17:/mnt/ram/data
>
> Brick4: 10.238.0.20:/mnt/ram/data
>
> Brick5: 10.238.0.19:/mnt/ram/data
>
> Brick6: 10.238.0.28:/mnt/ram/data
>
> Brick7: 10.238.0.27:/mnt/ram/data
>
> Brick8: 10.238.0.21:/mnt/ram/data
>
> Brick9: 10.238.0.24:/mnt/ram/data
>
> Brick10: 10.238.0.26:/mnt/ram/data
>
> Volume Name: disk
>
> Type: Replicate
>
> Volume ID: 9607ae5f-0dbf-4164-b260-5d9ce26d4fc7
>
> Status: Started
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: 10.238.0.18:/var/cache/gluster/data/options/pp/data
>
> Brick2: 10.238.0.16:/var/cache/gluster/data/options/pp/data
>
> Brick3: 10.238.0.17:/var/cache/gluster/data/options/pp/data
>
>
>
>
>
> *I’ve bare metaled one of the servers: 10.238.0.22. And, now trying to
> add it to the pool. So, after _gluster peer probe 10.238.0.22_ command,
> we can see that it’s in pool:*
>
>
>
> [root at nmIDPP20 ~]# gluster pool list
>
> UUID
> Hostname State
>
> baa648a5-ff35-44e0-80ea-a55e43154d12 10.238.0.50
> Connected
>
> 20bb470a-85da-4e3a-a66b-08a935c189ae 10.238.0.26
> Connected
>
> 79dffcf8-8c3a-47b5-926a-39be2c1406da 10.238.0.13
> Disconnected
>
> 7212e375-76a4-46c9-8bac-7470e2e5a910 10.238.0.17
> Connected
>
> c6080a14-33d7-4012-8940-2d9232752551 10.238.0.14
> Connected
>
> b553ed3c-21f1-4110-808d-4b08e6ded200 10.238.0.28
> Connected
>
> 5e596931-9151-4f5b-bc57-feb6fe46054f 10.238.0.7 Connected
>
> 8e1128ed-df07-4747-812e-dcc280fce5c1 10.238.0.16
> Connected
>
> 0b5fae30-e169-42ee-8f39-678d6fc93ac2 10.238.0.19
> Connected
>
> 0f82df55-3994-4561-8a0a-1c1d2e9c3cff 10.238.0.29 Connected
>
> 446ea1e4-61b9-4881-9073-6aeb9a154710 10.238.0.24
> Connected
>
> bcf84149-415b-4eb7-8dc1-2b284e135307 10.238.0.27
> Connected
>
> 97dddf9f-0b57-4bb8-86fd-196cb51df4b6 10.238.0.20 Connected
>
> b2bf8b3c-890b-423b-b901-f16f1186c3e6 10.238.0.4
> Connected
>
> 878ba732-0fea-4734-b1bc-a08ad7a2c97a 10.238.0.9
> Connected
>
> 51750fb0-c182-4e76-821f-16cee23fdf27 10.238.0.6 Connected
>
> b162e108-4301-47df-875f-92151244b694 10.238.0.8
> Connected
>
> 25d29db8-0916-4ef4-80d1-34fbf8aa5d26 10.238.0.21
> Connected
>
> 9acfb879-7df9-4c87-aa1c-eb518b9c668d 10.238.0.12
> Connected
>
> aacd1fa1-940c-4cec-9b04-1fb49348e764 10.238.0.49
> Connected
>
> 5c36b282-9842-4b85-8d0f-e5101817dfe1 10.238.0.18
> Connected
>
> a5298a13-144d-46e1-856f-91ade6649840 10.238.0.10
> Connected
>
> 4e7b83bd-367e-419d-aa5b-34947021dbc3 10.238.0.48
> Connected
>
> 6aa7957f-be6f-4bee-a748-32937d3ababd 10.238.0.47
> Connected
>
> 3890ac7d-7959-4565-86de-fc792cc357b0 10.238.0.45
> Disconnected
>
> 4814a743-5b52-44ab-b169-e907082aa229 10.238.0.32
> Connected
>
> cf735cd8-75e3-413b-88c5-46e5b79f7558 10.238.0.42
> Connected
>
> b1fa7e22-2e1b-4d07-966e-3096e58e5c78 10.238.0.39
> Connected
>
> 1459fce8-110c-478f-815e-89507225226e 10.238.0.34
> Connected
>
> a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77 10.238.0.25
> Connected
>
> dab1a271-4244-41bc-b770-7b13bd6e399d 10.238.0.43
> Connected
>
> 5b483c65-0d04-4188-85a9-77dfbbef78cd 10.238.0.41
> Connected
>
> 1b8cb9d8-ce8f-49aa-b958-705dd09db073 10.238.0.40
> Connected
>
> 4b4f85a0-1310-45df-a613-e33c967cc53d 10.238.0.38
> Connected
>
> dab043b8-11ba-4fa6-9b82-baa18b41167d 10.238.0.33
> Disconnected
>
> 06cbc4c2-9d79-4689-9ac6-3dbc2250d903 10.238.0.30
> Connected
>
> f33451c7-e984-495c-8e34-0b2d99a21e1e 10.238.0.31
> Connected
>
> 1873e2ce-1239-4b6d-930f-af14e9c1f13b 10.238.0.5
> Connected
>
> c85de12f-23e6-4797-adb4-d33b7b4eb5fc 10.238.0.11
> Connected
>
> 4147639d-652e-49a8-aa8b-d77327cca9ca 10.238.0.15
> Connected
>
> 07580a32-c558-449d-b454-044fb679c908 10.238.0.22
> Connected
>
> d5140e78-498d-4c63-868d-189554aef7d4 localhost
> Connected
>
>
>
>
>
> But, _gluster peer status_ is giving following output:
>
>
>
> [root at nmIDPP20 ~]# gluster peer status
>
> Number of Peers: 41
>
>
>
> Hostname: 10.238.0.50
>
> Uuid: baa648a5-ff35-44e0-80ea-a55e43154d12
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.26
>
> Uuid: 20bb470a-85da-4e3a-a66b-08a935c189ae
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.13
>
> Uuid: 79dffcf8-8c3a-47b5-926a-39be2c1406da
>
> State: Peer in Cluster (Disconnected)
>
>
>
> Hostname: 10.238.0.17
>
> Uuid: 7212e375-76a4-46c9-8bac-7470e2e5a910
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.14
>
> Uuid: c6080a14-33d7-4012-8940-2d9232752551
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.28
>
> Uuid: b553ed3c-21f1-4110-808d-4b08e6ded200
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.7
>
> Uuid: 5e596931-9151-4f5b-bc57-feb6fe46054f
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.16
>
> Uuid: 8e1128ed-df07-4747-812e-dcc280fce5c1
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.19
>
> Uuid: 0b5fae30-e169-42ee-8f39-678d6fc93ac2
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.29
>
> Uuid: 0f82df55-3994-4561-8a0a-1c1d2e9c3cff
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.24
>
> Uuid: 446ea1e4-61b9-4881-9073-6aeb9a154710
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.27
>
> Uuid: bcf84149-415b-4eb7-8dc1-2b284e135307
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.20
>
> Uuid: 97dddf9f-0b57-4bb8-86fd-196cb51df4b6
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.4
>
> Uuid: b2bf8b3c-890b-423b-b901-f16f1186c3e6
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.9
>
> Uuid: 878ba732-0fea-4734-b1bc-a08ad7a2c97a
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.6
>
> Uuid: 51750fb0-c182-4e76-821f-16cee23fdf27
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.8
>
> Uuid: b162e108-4301-47df-875f-92151244b694
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.21
>
> Uuid: 25d29db8-0916-4ef4-80d1-34fbf8aa5d26
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.12
>
> Uuid: 9acfb879-7df9-4c87-aa1c-eb518b9c668d
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.49
>
> Uuid: aacd1fa1-940c-4cec-9b04-1fb49348e764
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.18
>
> Uuid: 5c36b282-9842-4b85-8d0f-e5101817dfe1
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.10
>
> Uuid: a5298a13-144d-46e1-856f-91ade6649840
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.48
>
> Uuid: 4e7b83bd-367e-419d-aa5b-34947021dbc3
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.47
>
> Uuid: 6aa7957f-be6f-4bee-a748-32937d3ababd
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.45
>
> Uuid: 3890ac7d-7959-4565-86de-fc792cc357b0
>
> State: Peer in Cluster (Disconnected)
>
>
>
> Hostname: 10.238.0.32
>
> Uuid: 4814a743-5b52-44ab-b169-e907082aa229
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.42
>
> Uuid: cf735cd8-75e3-413b-88c5-46e5b79f7558
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.39
>
> Uuid: b1fa7e22-2e1b-4d07-966e-3096e58e5c78
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.34
>
> Uuid: 1459fce8-110c-478f-815e-89507225226e
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.25
>
> Uuid: a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.43
>
> Uuid: dab1a271-4244-41bc-b770-7b13bd6e399d
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.41
>
> Uuid: 5b483c65-0d04-4188-85a9-77dfbbef78cd
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.40
>
> Uuid: 1b8cb9d8-ce8f-49aa-b958-705dd09db073
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.38
>
> Uuid: 4b4f85a0-1310-45df-a613-e33c967cc53d
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.33
>
> Uuid: dab043b8-11ba-4fa6-9b82-baa18b41167d
>
> State: Peer in Cluster (Disconnected)
>
>
>
> Hostname: 10.238.0.30
>
> Uuid: 06cbc4c2-9d79-4689-9ac6-3dbc2250d903
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.31
>
> Uuid: f33451c7-e984-495c-8e34-0b2d99a21e1e
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.5
>
> Uuid: 1873e2ce-1239-4b6d-930f-af14e9c1f13b
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.11
>
> Uuid: c85de12f-23e6-4797-adb4-d33b7b4eb5fc
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.15
>
> Uuid: 4147639d-652e-49a8-aa8b-d77327cca9ca
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: 10.238.0.22
>
> Uuid: 07580a32-c558-449d-b454-044fb679c908
>
> State: Probe Sent to Peer (Connected)
The state indicates that the handshaking is not completed yet.
I suggest the following work around:
Find 07580a32-c558-449d-b454-044fb679c908 file in
/var/lib/glusterd/peers directory from all the nodes (except node 22).
This file contains details of the peer which are uuid, state and
hostname. Update state with 3 and then restart glusterd on all 49 nodes
one by one. Please note the handshaking will take some time (~10
minutes). Post that you should be able to see this node back in the cluster.
Let me know if this doesn't work. I'll be happy to assist you further.
>
>
>
>
>
> And after staying in this state for about 10 min, .22 node disappears
> from pool list. Also, during peer probe, on node .22 if you do _gluster
> pool list_ , it hangs and does not do anything. Only, after few mins it
> releases the shell, and outputs nothing.
>
>
>
> I’ve tried to do couple of things to resolve the issue:
>
> 1. Disabled firewall -> didn’t help
>
> 2. Removed mgmt directory from 22, restarted gluster service and
> glusterfs/d processes -> didn’t help
>
> 3. Tried to probe .22 from another server -> didn’t help
>
> 4. Reset uuid of .22 -> didn’t help
>
>
>
> I don’t know what I can do more, so asking for support from you.
>
>
>
>
>
> Following are logs during probe from .22 and .23:
>
>
>
> 10.238.0.22:/
>
>
>
> [2016-05-06 19:45:24.463346] I
> [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
> [2016-05-06 19:46:01.295054] I
> [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
> [2016-05-06 19:46:50.518018] I
> [glusterd-handshake.c:563:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30501
>
> [2016-05-06 19:46:50.521829] I
> [glusterd-handler.c:2346:__glusterd_handle_probe_query] 0-glusterd:
> Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4
>
> [2016-05-06 19:47:10.542419] I
> [glusterd-handler.c:2374:__glusterd_handle_probe_query] 0-glusterd:
> Unable to find peerinfo for host: 10.238.0.23 (24007)
>
> [2016-05-06 19:47:10.548116] I [rpc-clnt.c:972:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
>
> [2016-05-06 19:47:10.548218] I [socket.c:3561:socket_init] 0-management:
> SSL support is NOT enabled
>
> [2016-05-06 19:47:10.548239] I [socket.c:3576:socket_init] 0-management:
> using system polling thread
>
> [2016-05-06 19:47:10.553769] I
> [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect
> returned 0
>
> [2016-05-06 19:47:10.553886] I
> [glusterd-handler.c:2398:__glusterd_handle_probe_query] 0-glusterd:
> Responded to 10.238.0.23, op_ret: 0, op_errno: 0, ret: 0
>
> [2016-05-06 19:47:10.554650] I
> [glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4
>
> [2016-05-06 19:50:50.812036] E
> [glusterd-utils.c:4692:glusterd_brick_start] 0-management: Could not
> find peer on which brick 10.238.0.15:/mnt/ram/data resides
>
>
>
>
>
> 10.238.0.23:
>
>
>
> [2016-05-06 19:46:28.982091] I
> [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd:
> Received cli list req
>
> [2016-05-06 19:46:31.930017] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:34.934960] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:37.916015] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:40.947036] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:43.950373] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:46.961104] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:49.966875] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:50.497510] I
> [glusterd-handler.c:918:__glusterd_handle_cli_probe] 0-glusterd:
> Received CLI probe req 10.238.0.22 24007
>
> [2016-05-06 19:46:50.502555] I
> [glusterd-handler.c:2931:glusterd_probe_begin] 0-glusterd: Unable to
> find peerinfo for host: 10.238.0.22 (24007)
>
> [2016-05-06 19:46:50.511183] I [rpc-clnt.c:972:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
>
> [2016-05-06 19:46:50.511279] I [socket.c:3561:socket_init] 0-management:
> SSL support is NOT enabled
>
> [2016-05-06 19:46:50.511300] I [socket.c:3576:socket_init] 0-management:
> using system polling thread
>
> [2016-05-06 19:46:50.517005] I
> [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect
> returned 0
>
> [2016-05-06 19:46:52.983838] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:55.975533] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:46:58.989536] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:01.994423] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:04.995025] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:07.995849] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:10.553738] I
> [glusterd-rpc-ops.c:234:__glusterd_probe_cbk] 0-glusterd: Received probe
> resp from uuid: 07580a32-c558-449d-b454-044fb679c908, host: 10.238.0.22
>
> [2016-05-06 19:47:10.559641] I
> [glusterd-rpc-ops.c:306:__glusterd_probe_cbk] 0-glusterd: Received resp
> to probe req
>
> [2016-05-06 19:47:11.006166] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:14.009705] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:16.996479] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:20.024705] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:23.035546] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
> [2016-05-06 19:47:26.041132] E
> [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk]
> 0-management: Failed to get handshake ack from remote server
>
>
>
>
>
> Please, advice anything to handle this issue.
>
>
>
> Thanks,
>
> Azamat
>
> Phone: 703-667-8922
>
>
>
>
> _____________________________________________________
> This electronic message and any files transmitted with it contains
> information from iDirect, which may be privileged, proprietary
> and/or confidential. It is intended solely for the use of the individual
> or entity to whom they are addressed. If you are not the original
> recipient or the person responsible for delivering the email to the
> intended recipient, be advised that you have received this email
> in error, and that any use, dissemination, forwarding, printing, or
> copying of this email is strictly prohibited. If you received this email
> in error, please delete it and immediately notify the sender.
> _____________________________________________________
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
More information about the Gluster-users
mailing list