[Gluster-users] Help, peer probe seems to get stuck on large cluster.

Yiping Peng barius.cn at gmail.com
Mon Aug 31 09:47:35 UTC 2015


The "Disconnected" state of nodes randomly changes, so I randomly picked a
node and tailed last several lines
of /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (is it the right log
file?).

I can still access the cluster from servers already in pool, either reading
or writing is fine.

The log shows a log of "Failed to set keep-alive: Protocol not available":

Thanks.

[2015-08-31 09:38:25.586073] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:27.193523] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 8ed2d6cf-9758-4adf-8ed2-2d87f76491cf
[2015-08-31 09:38:27.209085] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:27.370367] C
[rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-management: server
10.88.153.23:24007 has not responded in the last 30 seconds, disconnecting.
[2015-08-31 09:38:28.803311] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 05885701-9a7c-4d2a-b18a-b5d9de2ccd57
[2015-08-31 09:38:28.818834] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
The message "I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: f7de5463-080d-4547-9601-0e9541dea928"
repeated 4 times between [2015-08-31 09:36:30.776194] and [2015-08-31
09:38:06.162677]
The message "I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 62eb172c-58ac-47c8-931e-05e5ad5a3133"
repeated 4 times between [2015-08-31 09:36:32.404743] and [2015-08-31
09:38:07.779594]
[2015-08-31 09:38:30.419141] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server62.yq01.local.net> (<3d354922-4bcd-4469-9e2e-559067882217>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:30.419188] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server52.yq01.local.net> (<6466759d-05eb-406e-9ede-a36dbf26c216>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:30.419299] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 62eb172c-58ac-47c8-931e-05e5ad5a3133
[2015-08-31 09:38:30.434835] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:32.035177] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 4db788d9-d372-4f57-a0f4-ba11d480013d
[2015-08-31 09:38:33.373803] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 69, Protocol not available
[2015-08-31 09:38:33.373821] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:33.376719] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 70, Protocol not available
[2015-08-31 09:38:33.376735] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:32.050834] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:33.651240] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 9a291ec2-8f75-47fa-b4f4-c3edc02e9ce8
[2015-08-31 09:38:33.666825] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:35.267184] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server75.yq01.local.net> (<aeb43c67-1dd3-45e9-abbf-cc0037472724>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:35.267237] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/7abc6dc0317b0f84408f0bc69917073c.socket failed (Invalid
argument)
[2015-08-31 09:38:35.267253] I [MSGID: 106006]
[glusterd-svc-mgmt.c:319:glusterd_svc_common_rpc_notify] 0-management: nfs
has disconnected from glusterd.
[2015-08-31 09:38:35.267352] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: df2686ca-e020-4593-97d8-bd50de4b2775
[2015-08-31 09:38:35.282829] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:36.877526] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] )))))
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
at 2015-08-31 09:37:43.506542 (xid=0x1535)
[2015-08-31 09:38:36.877553] E [MSGID: 106167]
[glusterd-handshake.c:2078:__glusterd_peer_dump_version_cbk] 0-management:
Error through RPC layer, retry again later
[2015-08-31 09:38:36.877643] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] )))))
0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at
2015-08-31 09:37:43.506554 (xid=0x1536)
[2015-08-31 09:38:36.877659] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk]
0-management: socket disconnected
[2015-08-31 09:38:36.877676] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server6.yq01.local.net> (<eb491a24-3edd-494a-90c0-b4280bd6995e>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:36.877823] W
[glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (-->
/usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7fb93316a111]
(-->
/usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7fb9330d0300]
(-->
/usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb9330b3a50]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7fb93d5809a3] )))))
0-management: Lock for vol speech0 not held
[2015-08-31 09:38:36.877840] W [MSGID: 106118]
[glusterd-handler.c:5073:__glusterd_peer_rpc_notify] 0-management: Lock not
released for speech0
[2015-08-31 09:38:36.877889] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server48.yq01.local.net> (<372c820d-003e-4885-870c-547ca17f6770>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:36.878012] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: d903d2f1-458d-43ae-a057-3f4999d3123a
[2015-08-31 09:38:36.893088] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:37.380052] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 12, Protocol not available
[2015-08-31 09:38:37.380071] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:38.492491] W [socket.c:642:__socket_rwv]
0-socket.management: writev on 10.88.155.28:65379 failed (Broken pipe)
[2015-08-31 09:38:38.492510] I [socket.c:2409:socket_event_handler]
0-transport: disconnecting now
[2015-08-31 09:38:38.492565] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT 0 on socket 5, Protocol not available
[2015-08-31 09:38:38.492576] W [socket.c:2673:socket_server_event_handler]
0-socket.management: Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:38.492669] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
worker09.yq01.local.net> (<c0f4eab2-9cdd-4ba8-a002-259456288fd3>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:38.492715] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server53.yq01.local.net> (<b1f15cce-36e4-4ef4-a22f-70bafb0bf8d3>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:38.492786] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 96aa9f85-f979-42a8-ac0a-1136384fbc14
[2015-08-31 09:38:38.508078] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:39.383260] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 27, Protocol not available
[2015-08-31 09:38:39.383280] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:40.108404] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 72e2074f-921d-45d6-9601-deee653075a9
[2015-08-31 09:38:40.124073] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:41.386485] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 23, Protocol not available
[2015-08-31 09:38:41.386506] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:41.389473] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 30, Protocol not available
[2015-08-31 09:38:41.389486] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:41.733507] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: f1c1b3d9-326d-4730-b1b0-788690da2ce1
[2015-08-31 09:38:41.749079] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:43.348570] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 455da276-9ef5-46ab-90f9-457a70432224
[2015-08-31 09:38:43.364074] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:44.964456] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server43.yq01.local.net> (<76cb46d9-5669-47db-b264-68b55d4c37f0>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:44.964578] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 00d5caae-b647-4dae-8d3e-df1e7f08941f
[2015-08-31 09:38:44.980073] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:45.392805] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 38, Protocol not available
[2015-08-31 09:38:45.392825] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:46.393009] C
[rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-management: server
10.88.155.15:24007 has not responded in the last 30 seconds, disconnecting.
[2015-08-31 09:38:46.584515] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: e204bc20-9c4f-449c-9dfc-f6e54b96bf8c
[2015-08-31 09:38:46.600079] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:47.396000] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 35, Protocol not available
[2015-08-31 09:38:47.396019] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:48.198525] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 607e3f7a-65e6-423a-9226-5f763f9838e8
[2015-08-31 09:38:48.214089] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:49.815541] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: e2322b18-2e5f-4c3c-8cc2-84b137fa7328
[2015-08-31 09:38:49.831078] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:51.434550] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] )))))
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
at 2015-08-31 09:37:56.464514 (xid=0x1315)
[2015-08-31 09:38:51.434579] E [MSGID: 106167]
[glusterd-handshake.c:2078:__glusterd_peer_dump_version_cbk] 0-management:
Error through RPC layer, retry again later
[2015-08-31 09:38:51.434669] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] )))))
0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at
2015-08-31 09:37:56.464526 (xid=0x1316)
[2015-08-31 09:38:51.434685] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk]
0-management: socket disconnected
[2015-08-31 09:38:51.434704] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server42.yq01.local.net> (<0b24198f-dfad-4259-bc22-9f3736f53824>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:38:51.434850] W
[glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (-->
/usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7fb93316a111]
(-->
/usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7fb9330d0300]
(-->
/usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb9330b3a50]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7fb93d5809a3] )))))
0-management: Lock for vol speech0 not held
[2015-08-31 09:38:51.434867] W [MSGID: 106118]
[glusterd-handler.c:5073:__glusterd_peer_rpc_notify] 0-management: Lock not
released for speech0
[2015-08-31 09:38:51.434994] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 8ed2d6cf-9758-4adf-8ed2-2d87f76491cf
[2015-08-31 09:38:51.450075] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:53.049543] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: f7de5463-080d-4547-9601-0e9541dea928
[2015-08-31 09:38:53.065083] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:54.666534] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 05885701-9a7c-4d2a-b18a-b5d9de2ccd57
[2015-08-31 09:38:54.682066] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:57.399884] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 45, Protocol not available
[2015-08-31 09:38:57.399906] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:57.402816] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 69, Protocol not available
[2015-08-31 09:38:57.402830] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available
[2015-08-31 09:38:56.301076] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:57.897551] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 9a291ec2-8f75-47fa-b4f4-c3edc02e9ce8
[2015-08-31 09:38:57.913072] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:38:59.513520] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: df2686ca-e020-4593-97d8-bd50de4b2775
[2015-08-31 09:38:59.529073] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:39:01.129419] I [MSGID: 106004]
[glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <
server75.yq01.local.net> (<aeb43c67-1dd3-45e9-abbf-cc0037472724>), in state
<Peer in Cluster>, has disconnected from glusterd.
[2015-08-31 09:39:01.129469] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/7abc6dc0317b0f84408f0bc69917073c.socket failed (Invalid
argument)
[2015-08-31 09:39:01.129484] I [MSGID: 106006]
[glusterd-svc-mgmt.c:319:glusterd_svc_common_rpc_notify] 0-management: nfs
has disconnected from glusterd.
[2015-08-31 09:39:01.129587] I [MSGID: 106492]
[glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: d903d2f1-458d-43ae-a057-3f4999d3123a
[2015-08-31 09:39:01.145074] I [MSGID: 106502]
[glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2015-08-31 09:39:01.406146] W [socket.c:923:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 12, Protocol not available
[2015-08-31 09:39:01.406168] E [socket.c:3019:socket_connect] 0-management:
Failed to set keep-alive: Protocol not available



2015-08-31 16:54 GMT+08:00 Atin Mukherjee <amukherj at redhat.com>:

>
>
> On 08/31/2015 01:10 PM, Yiping Peng wrote:
> > Hi guys,
> >
> >
> > I've been running GlusterFS for a couple of days and it's been nice and
> > steady, except a minor problem: the peer probing on my relatively large
> > cluster seems to stuck for a long time.
> >
> >
> > Last time atinm told me in IRC (I was barius.2333 in IRC) that a cluster
> as
> > large as 50+ nodes might take a long time peer probing (o(n^2) time), and
> > now my cluster has expanded to 90+ nodes.
> >
> >
> > The peer probing process was started 4 days ago, when my cluster had ~50
> > nodes. I probed ~40 nodes using subprocess in bash at once, and the
> > commands all successfully returned almost immediately (no time-outs).
> >
> >
> > However the glusterd kept writing to /var/lib/glusterd/peers/ during the
> > last 4 days, and all commands related to newly-added nodes, e.g.
> add-brick,
> > mount, will time-out and fail. Also, running “gluster peer status” on my
> > nodes shows “Disconnected” nodes that varies over time.
> Peer status should not shows node in disconnected state even if the peer
> handshaking takes longer time, if it does then something is wrong. Could
> you check which node is disconnected and what the glusterd log file on
> that node indicates?
> >
> >
> > What shall I do in such situation? Do I need to wait for the whole peer
> > probing progress to complete, or can I simply kill the glusterd and
> restart
> > it?
> >
> >
> > Regards,
> >
> > Yiping Peng
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150831/32d8593c/attachment.html>


More information about the Gluster-users mailing list