[Gluster-users] peer probe failures
Atin Mukherjee
amukherj at redhat.com
Fri Jun 16 06:57:19 UTC 2017
can you please share the glusterd log file?
On Thu, Jun 15, 2017 at 5:18 PM, Guy Cukierman <guyc at elminda.com> wrote:
> Hi,
>
> I’m having a similar issue, were you able to solve it?
>
> Thanks.
>
>
>
>
>
>
>
> Hey all,
>
>
>
> I've got a strange problem going on here. I've installed glusterfs-server
>
> on ubuntu 16.04:
>
> glusterfs-client/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic]
>
> glusterfs-common/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic]
>
> glusterfs-server/xenial,now 3.7.6-1ubuntu1 amd64 [installed]
>
>
>
> I can successfully probe another peer at this point. Then, after installing
>
> kubernetes via kargo, peer probing begins failing with a timeout. I've
>
> tried stopping all kubernetes related services, and flushing all iptables
>
> rules, however I don't see any packets leaving any interface when
>
> attempting to peer probe.
>
>
>
> from cli.log:
>
> [2017-04-03 22:20:24.704900] I [MSGID: 101190]
>
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>
> with index 1
>
> [2017-04-03 22:20:24.704973] T [cli.c:273:cli_rpc_notify] 0-glusterfs: got
>
> RPC_CLNT_CONNECT
>
> [2017-04-03 22:20:24.705001] T [cli-quotad-client.c:94:cli_quotad_notify]
>
> 0-glusterfs: got RPC_CLNT_CONNECT
>
> [2017-04-03 22:20:24.705014] I [socket.c:2355:socket_event_handler]
>
> 0-transport: disconnecting now
>
> [2017-04-03 22:20:24.705204] T [rpc-clnt.c:1404:rpc_clnt_record]
>
> 0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner:
>
> [2017-04-03 22:20:24.705256] T
>
> [rpc-clnt.c:1261:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
>
> 156, payload: 92, rpc hdr: 64
>
> [2017-04-03 22:20:24.705662] T [socket.c:2879:socket_connect] (-->
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[
> 0x7f012fd21953]
>
> (--> /usr/lib/x86_64-linux-gnu
>
> /libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4]
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[
> 0x7f012f697af5]
>
> (--> /usr/lib/x8
>
> 6_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338] (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_
> notify+0x23)[0x7f012f6945b3]
>
> ))))) 0-glusterfs: connect
>
> () called on transport already connected
>
> [2017-04-03 22:20:24.705680] D
>
> [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (-->
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[
> 0x7f012fd21953]
>
> (--> /
>
> usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_
> ping_timer_locked+0x84)[0x7f012f69add4]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[
> 0x7f012f
>
> 697af5] (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+
> 0x88)[0x7f012f698338]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_
> notify+0x23)[0x7f012f6945b3]
>
> )))
>
> )) 0-: /var/run/gluster/quotad.socket: ping timer event already removed
>
> [2017-04-03 22:20:24.705710] T [cli-quotad-client.c:100:cli_quotad_notify]
>
> 0-glusterfs: got RPC_CLNT_DISCONNECT
>
> [2017-04-03 22:20:24.705718] T [rpc-clnt.c:1598:rpc_clnt_submit]
>
> 0-rpc-clnt: submitted request (XID: 0x1 Program: Gluster CLI, ProgVers: 2,
>
> Proc: 1) to rpc-transport (glusterfs)
>
> [2017-04-03 22:20:24.705739] D [rpc-clnt-ping.c:281:rpc_clnt_start_ping]
>
> 0-glusterfs: ping timeout is 0, returning
>
> [2017-04-03 22:20:24.705723] D [MSGID: 0]
>
> [event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation
> bumped
>
> on idx=1 from gen=1 to slot->gen=2, fd=7, slot->fd=7
>
> [2017-04-03 22:20:27.614881] T [rpc-clnt.c:418:rpc_clnt_reconnect]
>
> 0-glusterfs: attempting reconnect
>
> [2017-04-03 22:20:27.615151] T [socket.c:2879:socket_connect] (-->
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[
> 0x7f012fd21953]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/
> socket.so(+0x6c1b)[0x7f012a697c1b]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[
> 0x7f012f695999]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_
> proc+0xfc)[0x7f012fd3d70c]
>
> (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] )))))
>
> 0-glusterfs: connect () called on transport already connected
>
>
>
> it then repeats the following:
>
> [2017-04-03 22:20:27.615177] T [rpc-clnt.c:418:rpc_clnt_reconnect]
>
> 0-glusterfs: attempting reconnect
>
> [2017-04-03 22:20:27.615188] T [socket.c:2887:socket_connect] 0-glusterfs:
>
> connecting 0x25d3550, state=0 gen=0 sock=-1
>
> [2017-04-03 22:20:27.615200] T
>
> [name.c:295:af_unix_client_get_remote_sockaddr] 0-glusterfs: using
>
> connect-path /var/run/gluster/quotad.socket
>
> [2017-04-03 22:20:27.615218] T [name.c:111:af_unix_client_bind]
>
> 0-glusterfs: bind-path not specified for unix socket, letting connect to
>
> assign default value
>
> [2017-04-03 22:20:27.615329] T [cli-quotad-client.c:94:cli_quotad_notify]
>
> 0-glusterfs: got RPC_CLNT_CONNECT
>
> [2017-04-03 22:20:27.615355] I [socket.c:2355:socket_event_handler]
>
> 0-transport: disconnecting now
>
> [2017-04-03 22:20:27.615567] D
>
> [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (-->
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[
> 0x7f012fd21953]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_
> ping_timer_locked+0x84)[0x7f012f69add4]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[
> 0x7f012f697af5]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+
> 0x88)[0x7f012f698338]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_
> notify+0x23)[0x7f012f6945b3]
>
> ))))) 0-: /var/run/gluster/quotad.socket: ping timer event already removed
>
> [2017-04-03 22:20:27.615598] T [cli-quotad-client.c:100:cli_quotad_notify]
>
> 0-glusterfs: got RPC_CLNT_DISCONNECT
>
> [2017-04-03 22:20:27.615612] D [MSGID: 0]
>
> [event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation
> bumped
>
> on idx=1 from gen=4 to slot->gen=5, fd=7, slot->fd=7
>
> [2017-04-03 22:20:30.615864] T [rpc-clnt.c:418:rpc_clnt_reconnect]
>
> 0-glusterfs: attempting reconnect
>
> [2017-04-03 22:20:30.616113] T [socket.c:2879:socket_connect] (-->
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[
> 0x7f012fd21953]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/
> socket.so(+0x6c1b)[0x7f012a697c1b]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[
> 0x7f012f695999]
>
> (-->
>
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_
> proc+0xfc)[0x7f012fd3d70c]
>
> (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] )))))
>
> 0-glusterfs: connect () called on transport already connected
>
>
>
> and from etc-glusterfs-glusterd.vol.log i see the probe request and then:
>
> [2017-04-03 22:20:24.707258] D [MSGID: 0]
>
> [glusterd-peer-utils.c:133:glusterd_peerinfo_find_by_hostname]
>
> 0-management: Unable to find friend: 10.1.0.7
>
> [2017-04-03 22:20:24.707260] D [MSGID: 0]
>
> [glusterd-peer-utils.c:247:glusterd_peerinfo_find] 0-management: Unable to
>
> find hostname: 10.1.0.7
>
> [2017-04-03 22:20:24.707293] I [MSGID: 106129]
>
> [glusterd-handler.c:3600:glusterd_probe_begin] 0-glusterd: Unable to find
>
> peerinfo for host: 10.1.0.7 (24007)
>
> [2017-04-03 22:20:24.707314] D [MSGID: 0]
>
> [glusterd-peer-utils.c:458:glusterd_peer_hostname_new] 0-glusterd:
>
> Returning 0
>
> [2017-04-03 22:20:24.707329] D [MSGID: 0]
>
> [glusterd-utils.c:5974:glusterd_sm_tr_log_init] 0-glusterd: returning 0
>
> [2017-04-03 22:20:24.707353] D [MSGID: 0]
>
> [glusterd-store.c:3802:glusterd_store_create_peer_dir] 0-glusterd:
>
> Returning with 0
>
> [2017-04-03 22:20:24.728550] D [MSGID: 0] [store.c:421:gf_store_handle_
> new]
>
> 0-: Returning 0
>
> [2017-04-03 22:20:24.728679] D [MSGID: 0] [store.c:375:gf_store_save_
> value]
>
> 0-management: returning: 0
>
> [2017-04-03 22:20:24.728720] D [logging.c:1952:_gf_msg_internal]
>
> 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About
>
> to flush least recently used log message to disk
>
> The message "D [MSGID: 0] [store.c:375:gf_store_save_value] 0-management:
>
> returning: 0" repeated 2 times between [2017-04-03 22:20:24.728679] and
>
> [2017-04-03 22:20:24.728718]
>
> [2017-04-03 22:20:24.728720] D [MSGID: 0]
>
> [glusterd-store.c:3931:glusterd_store_peer_write] 0-glusterd: Returning
>
> with 0
>
> [2017-04-03 22:20:24.786973] D [MSGID: 0]
>
> [glusterd-store.c:3957:glusterd_store_perform_peer_store] 0-glusterd:
>
> Returning 0
>
> [2017-04-03 22:20:24.787007] D [MSGID: 0]
>
> [glusterd-store.c:3978:glusterd_store_peerinfo] 0-glusterd: Returning
> with 0
>
> [2017-04-03 22:20:24.787046] D [MSGID: 0]
>
> [glusterd-handler.c:3391:glusterd_transport_inet_options_build]
> 0-glusterd:
>
> Returning 0
>
> [2017-04-03 22:20:24.787077] I [rpc-clnt.c:984:rpc_clnt_connection_init]
>
> 0-management: setting frame-timeout to 600
>
> [2017-04-03 22:20:24.787090] D [rpc-clnt.c:996:rpc_clnt_connection_init]
>
> 0-management: setting ping-timeout to 30
>
> [2017-04-03 22:20:24.787106] D [rpc-transport.c:288:rpc_transport_load]
>
> 0-rpc-transport: attempt to load file
>
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so
>
> [2017-04-03 22:20:24.787147] T [MSGID: 0]
>
> [options.c:91:xlator_option_validate_int] 0-management: no range check
>
> required for 'option transport.tcp-user-timeout -1'
>
> [2017-04-03 22:20:24.787166] T [MSGID: 0]
>
> [options.c:91:xlator_option_validate_int] 0-management: no range check
>
> required for 'option transport.socket.keepalive-time 10'
>
> [2017-04-03 22:20:24.787182] T [MSGID: 0]
>
> [options.c:91:xlator_option_validate_int] 0-management: no range check
>
> required for 'option transport.socket.keepalive-interval 2'
>
> [2017-04-03 22:20:24.787203] T [MSGID: 0]
>
> [options.c:91:xlator_option_validate_int] 0-management: no range check
>
> required for 'option remote-port 24007'
>
> [2017-04-03 22:20:24.787221] D [socket.c:3845:socket_init] 0-management:
>
> Configued transport.tcp-user-timeout=-1
>
> [2017-04-03 22:20:24.787233] D [socket.c:3928:socket_init] 0-management:
>
> SSL support on the I/O path is NOT enabled
>
> [2017-04-03 22:20:24.787242] D [socket.c:3931:socket_init] 0-management:
>
> SSL support for glusterd is NOT enabled
>
> [2017-04-03 22:20:24.787251] D [socket.c:3948:socket_init] 0-management:
>
> using system polling thread
>
> [2017-04-03 22:20:24.787263] T [rpc-clnt.c:418:rpc_clnt_reconnect]
>
> 0-management: attempting reconnect
>
> [2017-04-03 22:20:24.787273] T [socket.c:2887:socket_connect] 0-management:
>
> connecting 0x7ff7fc005640, state=0 gen=0 sock=-1
>
> [2017-04-03 22:20:24.787288] T [MSGID: 0]
>
> [common-utils.c:290:gf_resolve_ip6] 0-resolver: DNS cache not present,
>
> freshly probing hostname: 10.1.0.7
>
> [2017-04-03 22:20:24.787843] D [MSGID: 0]
>
> [common-utils.c:333:gf_resolve_ip6] 0-resolver: returning ip-10.1.0.7
>
> (port-24007) for hostname: 10.1.0.7 and port: 24007
>
> [2017-04-03 22:20:24.787863] D [socket.c:2833:socket_fix_ssl_opts]
>
> 0-management: disabling SSL for portmapper connection
>
> [2017-04-03 22:20:24.787883] T [socket.c:805:__socket_nodelay]
>
> 0-management: NODELAY enabled for socket 8
>
> [2017-04-03 22:20:24.787899] W [socket.c:869:__socket_keepalive] 0-socket:
>
> failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
>
> [2017-04-03 22:20:24.787910] E [socket.c:2965:socket_connect] 0-management:
>
> Failed to set keep-alive: Invalid argument
>
> [2017-04-03 22:20:24.788092] D [MSGID: 0]
>
> [common-utils.c:2887:gf_ports_reserved] 0-glusterfs: lower: 30000, higher:
>
> 32767
>
> [2017-04-03 22:21:52.911012] D [logging.c:1828:gf_log_flush_timeout_cbk]
>
> 0-logging-infra: Log timer timed out. About to flush outstanding messages
>
> if present
>
> [2017-04-03 22:21:52.911131] D [logging.c:1790:__gf_log_
> inject_timer_event]
>
> 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
>
> [2017-04-03 22:22:24.707002] D [socket.c:558:__socket_rwv]
>
> 0-socket.management: EOF on socket
>
> [2017-04-03 22:22:24.707032] D [socket.c:2355:socket_event_handler]
>
> 0-transport: disconnecting now
>
>
>
> I've debugged this to the best of my ability but haven't made any progress
>
> narrowing it down, nor has google turned up anything useful.
>
>
>
> Thanks for the help!
>
>
>
> Ken
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170616/40afa1d2/attachment.html>
More information about the Gluster-users
mailing list