[Gluster-users] Can not add peer

Muminov, Azamat amuminov at idirect.net
Fri May 6 20:09:05 UTC 2016


Hi,

I have a ~50 node cluster. I configured gluster so that there are 2 volumes: One is configured on top of HDD, and the other one is configured on top of RAM.

[root at nmIDPP20 ~]# gluster volume info
Volume Name: ram
Type: Distributed-Replicate
Volume ID: a97fa262-276b-41e9-8f59-40f28451f689
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: 10.238.0.15:/mnt/ram/data
Brick2: 10.238.0.16:/mnt/ram/data
Brick3: 10.238.0.17:/mnt/ram/data
Brick4: 10.238.0.20:/mnt/ram/data
Brick5: 10.238.0.19:/mnt/ram/data
Brick6: 10.238.0.28:/mnt/ram/data
Brick7: 10.238.0.27:/mnt/ram/data
Brick8: 10.238.0.21:/mnt/ram/data
Brick9: 10.238.0.24:/mnt/ram/data
Brick10: 10.238.0.26:/mnt/ram/data
Volume Name: disk
Type: Replicate
Volume ID: 9607ae5f-0dbf-4164-b260-5d9ce26d4fc7
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.238.0.18:/var/cache/gluster/data/options/pp/data
Brick2: 10.238.0.16:/var/cache/gluster/data/options/pp/data
Brick3: 10.238.0.17:/var/cache/gluster/data/options/pp/data


I've bare metaled one of the servers: 10.238.0.22. And, now trying to add it to the pool. So, after gluster peer probe 10.238.0.22 command, we can see that it's in pool:

[root at nmIDPP20 ~]# gluster pool list
UUID                                                                 Hostname           State
baa648a5-ff35-44e0-80ea-a55e43154d12              10.238.0.50        Connected
20bb470a-85da-4e3a-a66b-08a935c189ae            10.238.0.26        Connected
79dffcf8-8c3a-47b5-926a-39be2c1406da               10.238.0.13        Disconnected
7212e375-76a4-46c9-8bac-7470e2e5a910             10.238.0.17        Connected
c6080a14-33d7-4012-8940-2d9232752551            10.238.0.14        Connected
b553ed3c-21f1-4110-808d-4b08e6ded200             10.238.0.28        Connected
5e596931-9151-4f5b-bc57-feb6fe46054f 10.238.0.7           Connected
8e1128ed-df07-4747-812e-dcc280fce5c1               10.238.0.16        Connected
0b5fae30-e169-42ee-8f39-678d6fc93ac2               10.238.0.19        Connected
0f82df55-3994-4561-8a0a-1c1d2e9c3cff 10.238.0.29        Connected
446ea1e4-61b9-4881-9073-6aeb9a154710            10.238.0.24        Connected
bcf84149-415b-4eb7-8dc1-2b284e135307             10.238.0.27        Connected
97dddf9f-0b57-4bb8-86fd-196cb51df4b6 10.238.0.20        Connected
b2bf8b3c-890b-423b-b901-f16f1186c3e6               10.238.0.4           Connected
878ba732-0fea-4734-b1bc-a08ad7a2c97a             10.238.0.9           Connected
51750fb0-c182-4e76-821f-16cee23fdf27 10.238.0.6           Connected
b162e108-4301-47df-875f-92151244b694              10.238.0.8           Connected
25d29db8-0916-4ef4-80d1-34fbf8aa5d26              10.238.0.21        Connected
9acfb879-7df9-4c87-aa1c-eb518b9c668d               10.238.0.12        Connected
aacd1fa1-940c-4cec-9b04-1fb49348e764               10.238.0.49        Connected
5c36b282-9842-4b85-8d0f-e5101817dfe1              10.238.0.18        Connected
a5298a13-144d-46e1-856f-91ade6649840             10.238.0.10        Connected
4e7b83bd-367e-419d-aa5b-34947021dbc3            10.238.0.48        Connected
6aa7957f-be6f-4bee-a748-32937d3ababd              10.238.0.47        Connected
3890ac7d-7959-4565-86de-fc792cc357b0              10.238.0.45        Disconnected
4814a743-5b52-44ab-b169-e907082aa229            10.238.0.32        Connected
cf735cd8-75e3-413b-88c5-46e5b79f7558              10.238.0.42        Connected
b1fa7e22-2e1b-4d07-966e-3096e58e5c78             10.238.0.39        Connected
1459fce8-110c-478f-815e-89507225226e              10.238.0.34        Connected
a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77               10.238.0.25        Connected
dab1a271-4244-41bc-b770-7b13bd6e399d            10.238.0.43        Connected
5b483c65-0d04-4188-85a9-77dfbbef78cd              10.238.0.41        Connected
1b8cb9d8-ce8f-49aa-b958-705dd09db073             10.238.0.40        Connected
4b4f85a0-1310-45df-a613-e33c967cc53d              10.238.0.38        Connected
dab043b8-11ba-4fa6-9b82-baa18b41167d             10.238.0.33        Disconnected
06cbc4c2-9d79-4689-9ac6-3dbc2250d903             10.238.0.30        Connected
f33451c7-e984-495c-8e34-0b2d99a21e1e             10.238.0.31        Connected
1873e2ce-1239-4b6d-930f-af14e9c1f13b               10.238.0.5           Connected
c85de12f-23e6-4797-adb4-d33b7b4eb5fc              10.238.0.11        Connected
4147639d-652e-49a8-aa8b-d77327cca9ca             10.238.0.15        Connected
07580a32-c558-449d-b454-044fb679c908             10.238.0.22        Connected
d5140e78-498d-4c63-868d-189554aef7d4             localhost             Connected


But, gluster peer status is giving following output:

[root at nmIDPP20 ~]# gluster peer status
Number of Peers: 41

Hostname: 10.238.0.50
Uuid: baa648a5-ff35-44e0-80ea-a55e43154d12
State: Peer in Cluster (Connected)

Hostname: 10.238.0.26
Uuid: 20bb470a-85da-4e3a-a66b-08a935c189ae
State: Peer in Cluster (Connected)

Hostname: 10.238.0.13
Uuid: 79dffcf8-8c3a-47b5-926a-39be2c1406da
State: Peer in Cluster (Disconnected)

Hostname: 10.238.0.17
Uuid: 7212e375-76a4-46c9-8bac-7470e2e5a910
State: Peer in Cluster (Connected)

Hostname: 10.238.0.14
Uuid: c6080a14-33d7-4012-8940-2d9232752551
State: Peer in Cluster (Connected)

Hostname: 10.238.0.28
Uuid: b553ed3c-21f1-4110-808d-4b08e6ded200
State: Peer in Cluster (Connected)

Hostname: 10.238.0.7
Uuid: 5e596931-9151-4f5b-bc57-feb6fe46054f
State: Peer in Cluster (Connected)

Hostname: 10.238.0.16
Uuid: 8e1128ed-df07-4747-812e-dcc280fce5c1
State: Peer in Cluster (Connected)

Hostname: 10.238.0.19
Uuid: 0b5fae30-e169-42ee-8f39-678d6fc93ac2
State: Peer in Cluster (Connected)

Hostname: 10.238.0.29
Uuid: 0f82df55-3994-4561-8a0a-1c1d2e9c3cff
State: Peer in Cluster (Connected)

Hostname: 10.238.0.24
Uuid: 446ea1e4-61b9-4881-9073-6aeb9a154710
State: Peer in Cluster (Connected)

Hostname: 10.238.0.27
Uuid: bcf84149-415b-4eb7-8dc1-2b284e135307
State: Peer in Cluster (Connected)

Hostname: 10.238.0.20
Uuid: 97dddf9f-0b57-4bb8-86fd-196cb51df4b6
State: Peer in Cluster (Connected)

Hostname: 10.238.0.4
Uuid: b2bf8b3c-890b-423b-b901-f16f1186c3e6
State: Peer in Cluster (Connected)

Hostname: 10.238.0.9
Uuid: 878ba732-0fea-4734-b1bc-a08ad7a2c97a
State: Peer in Cluster (Connected)

Hostname: 10.238.0.6
Uuid: 51750fb0-c182-4e76-821f-16cee23fdf27
State: Peer in Cluster (Connected)

Hostname: 10.238.0.8
Uuid: b162e108-4301-47df-875f-92151244b694
State: Peer in Cluster (Connected)

Hostname: 10.238.0.21
Uuid: 25d29db8-0916-4ef4-80d1-34fbf8aa5d26
State: Peer in Cluster (Connected)

Hostname: 10.238.0.12
Uuid: 9acfb879-7df9-4c87-aa1c-eb518b9c668d
State: Peer in Cluster (Connected)

Hostname: 10.238.0.49
Uuid: aacd1fa1-940c-4cec-9b04-1fb49348e764
State: Peer in Cluster (Connected)

Hostname: 10.238.0.18
Uuid: 5c36b282-9842-4b85-8d0f-e5101817dfe1
State: Peer in Cluster (Connected)

Hostname: 10.238.0.10
Uuid: a5298a13-144d-46e1-856f-91ade6649840
State: Peer in Cluster (Connected)

Hostname: 10.238.0.48
Uuid: 4e7b83bd-367e-419d-aa5b-34947021dbc3
State: Peer in Cluster (Connected)

Hostname: 10.238.0.47
Uuid: 6aa7957f-be6f-4bee-a748-32937d3ababd
State: Peer in Cluster (Connected)

Hostname: 10.238.0.45
Uuid: 3890ac7d-7959-4565-86de-fc792cc357b0
State: Peer in Cluster (Disconnected)

Hostname: 10.238.0.32
Uuid: 4814a743-5b52-44ab-b169-e907082aa229
State: Peer in Cluster (Connected)

Hostname: 10.238.0.42
Uuid: cf735cd8-75e3-413b-88c5-46e5b79f7558
State: Peer in Cluster (Connected)

Hostname: 10.238.0.39
Uuid: b1fa7e22-2e1b-4d07-966e-3096e58e5c78
State: Peer in Cluster (Connected)

Hostname: 10.238.0.34
Uuid: 1459fce8-110c-478f-815e-89507225226e
State: Peer in Cluster (Connected)

Hostname: 10.238.0.25
Uuid: a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77
State: Peer in Cluster (Connected)

Hostname: 10.238.0.43
Uuid: dab1a271-4244-41bc-b770-7b13bd6e399d
State: Peer in Cluster (Connected)

Hostname: 10.238.0.41
Uuid: 5b483c65-0d04-4188-85a9-77dfbbef78cd
State: Peer in Cluster (Connected)

Hostname: 10.238.0.40
Uuid: 1b8cb9d8-ce8f-49aa-b958-705dd09db073
State: Peer in Cluster (Connected)

Hostname: 10.238.0.38
Uuid: 4b4f85a0-1310-45df-a613-e33c967cc53d
State: Peer in Cluster (Connected)

Hostname: 10.238.0.33
Uuid: dab043b8-11ba-4fa6-9b82-baa18b41167d
State: Peer in Cluster (Disconnected)

Hostname: 10.238.0.30
Uuid: 06cbc4c2-9d79-4689-9ac6-3dbc2250d903
State: Peer in Cluster (Connected)

Hostname: 10.238.0.31
Uuid: f33451c7-e984-495c-8e34-0b2d99a21e1e
State: Peer in Cluster (Connected)

Hostname: 10.238.0.5
Uuid: 1873e2ce-1239-4b6d-930f-af14e9c1f13b
State: Peer in Cluster (Connected)

Hostname: 10.238.0.11
Uuid: c85de12f-23e6-4797-adb4-d33b7b4eb5fc
State: Peer in Cluster (Connected)

Hostname: 10.238.0.15
Uuid: 4147639d-652e-49a8-aa8b-d77327cca9ca
State: Peer in Cluster (Connected)

Hostname: 10.238.0.22
Uuid: 07580a32-c558-449d-b454-044fb679c908
State: Probe Sent to Peer (Connected)


And after staying in this state for about 10 min, .22 node disappears from pool list. Also, during peer probe, on node .22 if you do gluster pool list , it hangs and does not do anything. Only, after few mins it releases the shell, and outputs nothing.

I've tried to do couple of things to resolve the issue:

1.      Disabled firewall -> didn't help

2.      Removed mgmt directory from 22, restarted gluster service and glusterfs/d processes  -> didn't help

3.      Tried to probe .22 from another server -> didn't help

4.      Reset uuid of .22 -> didn't help

I don't know what I can do more, so asking for support from you.


Following are logs during probe from .22 and .23:

10.238.0.22:/

[2016-05-06 19:45:24.463346] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2016-05-06 19:46:01.295054] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2016-05-06 19:46:50.518018] I [glusterd-handshake.c:563:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30501
[2016-05-06 19:46:50.521829] I [glusterd-handler.c:2346:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4
[2016-05-06 19:47:10.542419] I [glusterd-handler.c:2374:__glusterd_handle_probe_query] 0-glusterd: Unable to find peerinfo for host: 10.238.0.23 (24007)
[2016-05-06 19:47:10.548116] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-05-06 19:47:10.548218] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled
[2016-05-06 19:47:10.548239] I [socket.c:3576:socket_init] 0-management: using system polling thread
[2016-05-06 19:47:10.553769] I [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect returned 0
[2016-05-06 19:47:10.553886] I [glusterd-handler.c:2398:__glusterd_handle_probe_query] 0-glusterd: Responded to 10.238.0.23, op_ret: 0, op_errno: 0, ret: 0
[2016-05-06 19:47:10.554650] I [glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4
[2016-05-06 19:50:50.812036] E [glusterd-utils.c:4692:glusterd_brick_start] 0-management: Could not find peer on which brick 10.238.0.15:/mnt/ram/data resides


10.238.0.23:

[2016-05-06 19:46:28.982091] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2016-05-06 19:46:31.930017] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:34.934960] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:37.916015] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:40.947036] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:43.950373] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:46.961104] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:49.966875] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:50.497510] I [glusterd-handler.c:918:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 10.238.0.22 24007
[2016-05-06 19:46:50.502555] I [glusterd-handler.c:2931:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 10.238.0.22 (24007)
[2016-05-06 19:46:50.511183] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-05-06 19:46:50.511279] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled
[2016-05-06 19:46:50.511300] I [socket.c:3576:socket_init] 0-management: using system polling thread
[2016-05-06 19:46:50.517005] I [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect returned 0
[2016-05-06 19:46:52.983838] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:55.975533] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:46:58.989536] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:01.994423] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:04.995025] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:07.995849] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:10.553738] I [glusterd-rpc-ops.c:234:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: 07580a32-c558-449d-b454-044fb679c908, host: 10.238.0.22
[2016-05-06 19:47:10.559641] I [glusterd-rpc-ops.c:306:__glusterd_probe_cbk] 0-glusterd: Received resp to probe req
[2016-05-06 19:47:11.006166] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:14.009705] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:16.996479] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:20.024705] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:23.035546] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server
[2016-05-06 19:47:26.041132] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server


Please, advice anything to handle this issue.

Thanks,
Azamat
Phone: 703-667-8922


_____________________________________________________
This electronic message and any files transmitted with it contains
information from iDirect, which may be privileged, proprietary
and/or confidential. It is intended solely for the use of the individual
or entity to whom they are addressed. If you are not the original
recipient or the person responsible for delivering the email to the
intended recipient, be advised that you have received this email
in error, and that any use, dissemination, forwarding, printing, or
copying of this email is strictly prohibited. If you received this email
in error, please delete it and immediately notify the sender.
_____________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160506/00dd8138/attachment.html>


More information about the Gluster-users mailing list