[Gluster-users] peer probe failures

Kenneth Talley k.talley at gmail.com
Mon Apr 3 22:35:17 UTC 2017


Hey all,

I've got a strange problem going on here. I've installed glusterfs-server
on ubuntu 16.04:
glusterfs-client/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic]
glusterfs-common/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic]
glusterfs-server/xenial,now 3.7.6-1ubuntu1 amd64 [installed]

I can successfully probe another peer at this point. Then, after installing
kubernetes via kargo, peer probing begins failing with a timeout. I've
tried stopping all kubernetes related services, and flushing all iptables
rules, however I don't see any packets leaving any interface when
attempting to peer probe.

from cli.log:
[2017-04-03 22:20:24.704900] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2017-04-03 22:20:24.704973] T [cli.c:273:cli_rpc_notify] 0-glusterfs: got
RPC_CLNT_CONNECT
[2017-04-03 22:20:24.705001] T [cli-quotad-client.c:94:cli_quotad_notify]
0-glusterfs: got RPC_CLNT_CONNECT
[2017-04-03 22:20:24.705014] I [socket.c:2355:socket_event_handler]
0-transport: disconnecting now
[2017-04-03 22:20:24.705204] T [rpc-clnt.c:1404:rpc_clnt_record]
0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner:
[2017-04-03 22:20:24.705256] T
[rpc-clnt.c:1261:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
156, payload: 92, rpc hdr: 64
[2017-04-03 22:20:24.705662] T [socket.c:2879:socket_connect] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]
(--> /usr/lib/x86_64-linux-gnu
/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[0x7f012f697af5]
(--> /usr/lib/x8
6_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f012f6945b3]
))))) 0-glusterfs: connect
() called on transport already connected
[2017-04-03 22:20:24.705680] D
[rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]
(--> /
usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[0x7f012f
697af5] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f012f6945b3]
)))
)) 0-: /var/run/gluster/quotad.socket: ping timer event already removed
[2017-04-03 22:20:24.705710] T [cli-quotad-client.c:100:cli_quotad_notify]
0-glusterfs: got RPC_CLNT_DISCONNECT
[2017-04-03 22:20:24.705718] T [rpc-clnt.c:1598:rpc_clnt_submit]
0-rpc-clnt: submitted request (XID: 0x1 Program: Gluster CLI, ProgVers: 2,
Proc: 1) to rpc-transport (glusterfs)
[2017-04-03 22:20:24.705739] D [rpc-clnt-ping.c:281:rpc_clnt_start_ping]
0-glusterfs: ping timeout is 0, returning
[2017-04-03 22:20:24.705723] D [MSGID: 0]
[event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation bumped
on idx=1 from gen=1 to slot->gen=2, fd=7, slot->fd=7
[2017-04-03 22:20:27.614881] T [rpc-clnt.c:418:rpc_clnt_reconnect]
0-glusterfs: attempting reconnect
[2017-04-03 22:20:27.615151] T [socket.c:2879:socket_connect] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]
(-->
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so(+0x6c1b)[0x7f012a697c1b]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[0x7f012f695999]
(-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xfc)[0x7f012fd3d70c]
(--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] )))))
0-glusterfs: connect () called on transport already connected

it then repeats the following:
[2017-04-03 22:20:27.615177] T [rpc-clnt.c:418:rpc_clnt_reconnect]
0-glusterfs: attempting reconnect
[2017-04-03 22:20:27.615188] T [socket.c:2887:socket_connect] 0-glusterfs:
connecting 0x25d3550, state=0 gen=0 sock=-1
[2017-04-03 22:20:27.615200] T
[name.c:295:af_unix_client_get_remote_sockaddr] 0-glusterfs: using
connect-path /var/run/gluster/quotad.socket
[2017-04-03 22:20:27.615218] T [name.c:111:af_unix_client_bind]
0-glusterfs: bind-path not specified for unix socket, letting connect to
assign default value
[2017-04-03 22:20:27.615329] T [cli-quotad-client.c:94:cli_quotad_notify]
0-glusterfs: got RPC_CLNT_CONNECT
[2017-04-03 22:20:27.615355] I [socket.c:2355:socket_event_handler]
0-transport: disconnecting now
[2017-04-03 22:20:27.615567] D
[rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[0x7f012f697af5]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f012f6945b3]
))))) 0-: /var/run/gluster/quotad.socket: ping timer event already removed
[2017-04-03 22:20:27.615598] T [cli-quotad-client.c:100:cli_quotad_notify]
0-glusterfs: got RPC_CLNT_DISCONNECT
[2017-04-03 22:20:27.615612] D [MSGID: 0]
[event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation bumped
on idx=1 from gen=4 to slot->gen=5, fd=7, slot->fd=7
[2017-04-03 22:20:30.615864] T [rpc-clnt.c:418:rpc_clnt_reconnect]
0-glusterfs: attempting reconnect
[2017-04-03 22:20:30.616113] T [socket.c:2879:socket_connect] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]
(-->
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so(+0x6c1b)[0x7f012a697c1b]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[0x7f012f695999]
(-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xfc)[0x7f012fd3d70c]
(--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] )))))
0-glusterfs: connect () called on transport already connected

and from etc-glusterfs-glusterd.vol.log i see the probe request and then:
[2017-04-03 22:20:24.707258] D [MSGID: 0]
[glusterd-peer-utils.c:133:glusterd_peerinfo_find_by_hostname]
0-management: Unable to find friend: 10.1.0.7
[2017-04-03 22:20:24.707260] D [MSGID: 0]
[glusterd-peer-utils.c:247:glusterd_peerinfo_find] 0-management: Unable to
find hostname: 10.1.0.7
[2017-04-03 22:20:24.707293] I [MSGID: 106129]
[glusterd-handler.c:3600:glusterd_probe_begin] 0-glusterd: Unable to find
peerinfo for host: 10.1.0.7 (24007)
[2017-04-03 22:20:24.707314] D [MSGID: 0]
[glusterd-peer-utils.c:458:glusterd_peer_hostname_new] 0-glusterd:
Returning 0
[2017-04-03 22:20:24.707329] D [MSGID: 0]
[glusterd-utils.c:5974:glusterd_sm_tr_log_init] 0-glusterd: returning 0
[2017-04-03 22:20:24.707353] D [MSGID: 0]
[glusterd-store.c:3802:glusterd_store_create_peer_dir] 0-glusterd:
Returning with 0
[2017-04-03 22:20:24.728550] D [MSGID: 0] [store.c:421:gf_store_handle_new]
0-: Returning 0
[2017-04-03 22:20:24.728679] D [MSGID: 0] [store.c:375:gf_store_save_value]
0-management: returning: 0
[2017-04-03 22:20:24.728720] D [logging.c:1952:_gf_msg_internal]
0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About
to flush least recently used log message to disk
The message "D [MSGID: 0] [store.c:375:gf_store_save_value] 0-management:
returning: 0" repeated 2 times between [2017-04-03 22:20:24.728679] and
[2017-04-03 22:20:24.728718]
[2017-04-03 22:20:24.728720] D [MSGID: 0]
[glusterd-store.c:3931:glusterd_store_peer_write] 0-glusterd: Returning
with 0
[2017-04-03 22:20:24.786973] D [MSGID: 0]
[glusterd-store.c:3957:glusterd_store_perform_peer_store] 0-glusterd:
Returning 0
[2017-04-03 22:20:24.787007] D [MSGID: 0]
[glusterd-store.c:3978:glusterd_store_peerinfo] 0-glusterd: Returning with 0
[2017-04-03 22:20:24.787046] D [MSGID: 0]
[glusterd-handler.c:3391:glusterd_transport_inet_options_build] 0-glusterd:
Returning 0
[2017-04-03 22:20:24.787077] I [rpc-clnt.c:984:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2017-04-03 22:20:24.787090] D [rpc-clnt.c:996:rpc_clnt_connection_init]
0-management: setting ping-timeout to 30
[2017-04-03 22:20:24.787106] D [rpc-transport.c:288:rpc_transport_load]
0-rpc-transport: attempt to load file
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so
[2017-04-03 22:20:24.787147] T [MSGID: 0]
[options.c:91:xlator_option_validate_int] 0-management: no range check
required for 'option transport.tcp-user-timeout -1'
[2017-04-03 22:20:24.787166] T [MSGID: 0]
[options.c:91:xlator_option_validate_int] 0-management: no range check
required for 'option transport.socket.keepalive-time 10'
[2017-04-03 22:20:24.787182] T [MSGID: 0]
[options.c:91:xlator_option_validate_int] 0-management: no range check
required for 'option transport.socket.keepalive-interval 2'
[2017-04-03 22:20:24.787203] T [MSGID: 0]
[options.c:91:xlator_option_validate_int] 0-management: no range check
required for 'option remote-port 24007'
[2017-04-03 22:20:24.787221] D [socket.c:3845:socket_init] 0-management:
Configued transport.tcp-user-timeout=-1
[2017-04-03 22:20:24.787233] D [socket.c:3928:socket_init] 0-management:
SSL support on the I/O path is NOT enabled
[2017-04-03 22:20:24.787242] D [socket.c:3931:socket_init] 0-management:
SSL support for glusterd is NOT enabled
[2017-04-03 22:20:24.787251] D [socket.c:3948:socket_init] 0-management:
using system polling thread
[2017-04-03 22:20:24.787263] T [rpc-clnt.c:418:rpc_clnt_reconnect]
0-management: attempting reconnect
[2017-04-03 22:20:24.787273] T [socket.c:2887:socket_connect] 0-management:
connecting 0x7ff7fc005640, state=0 gen=0 sock=-1
[2017-04-03 22:20:24.787288] T [MSGID: 0]
[common-utils.c:290:gf_resolve_ip6] 0-resolver: DNS cache not present,
freshly probing hostname: 10.1.0.7
[2017-04-03 22:20:24.787843] D [MSGID: 0]
[common-utils.c:333:gf_resolve_ip6] 0-resolver: returning ip-10.1.0.7
(port-24007) for hostname: 10.1.0.7 and port: 24007
[2017-04-03 22:20:24.787863] D [socket.c:2833:socket_fix_ssl_opts]
0-management: disabling SSL for portmapper connection
[2017-04-03 22:20:24.787883] T [socket.c:805:__socket_nodelay]
0-management: NODELAY enabled for socket 8
[2017-04-03 22:20:24.787899] W [socket.c:869:__socket_keepalive] 0-socket:
failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
[2017-04-03 22:20:24.787910] E [socket.c:2965:socket_connect] 0-management:
Failed to set keep-alive: Invalid argument
[2017-04-03 22:20:24.788092] D [MSGID: 0]
[common-utils.c:2887:gf_ports_reserved] 0-glusterfs: lower: 30000, higher:
32767
[2017-04-03 22:21:52.911012] D [logging.c:1828:gf_log_flush_timeout_cbk]
0-logging-infra: Log timer timed out. About to flush outstanding messages
if present
[2017-04-03 22:21:52.911131] D [logging.c:1790:__gf_log_inject_timer_event]
0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2017-04-03 22:22:24.707002] D [socket.c:558:__socket_rwv]
0-socket.management: EOF on socket
[2017-04-03 22:22:24.707032] D [socket.c:2355:socket_event_handler]
0-transport: disconnecting now

I've debugged this to the best of my ability but haven't made any progress
narrowing it down, nor has google turned up anything useful.

Thanks for the help!

Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170403/533e9adf/attachment.html>


More information about the Gluster-users mailing list