[Gluster-users] gluster peer probe failing
Guy Cukierman
guyc at elminda.com
Tue Jun 20 09:06:35 UTC 2017
Thanks Gaurav!
1. Any time estimation on to when this fix would be released?
2. Any recommended workaround?
Best,
Guy.
From: Gaurav Yadav [mailto:gyadav at redhat.com]
Sent: Tuesday, June 20, 2017 9:46 AM
To: Guy Cukierman <guyc at elminda.com>
Cc: Atin Mukherjee <amukherj at redhat.com>; gluster-users at gluster.org
Subject: Re: [Gluster-users] gluster peer probe failing
Hi,
I am able to recreate the issue and here is my RCA.
Maximum value i.e 32767 is being overflowed while doing manipulation on it and it was previously not taken care properly.
Hence glusterd was crashing with SIGSEGV.
Issue is being fixed with "https://bugzilla.redhat.com/show_bug.cgi?id=1454418" and being backported as well.
Thanks
Gaurav
On Tue, Jun 20, 2017 at 6:43 AM, Gaurav Yadav <gyadav at redhat.com<mailto:gyadav at redhat.com>> wrote:
Hi,
I have tried on my host by setting corresponding ports, but I didn't see the issue on my machine locally.
However with the logs you have sent it is prety much clear issue is related to ports only.
I will trying to reproduce on some other machine. Will update you as s0on as possible.
Thanks
Gaurav
On Sun, Jun 18, 2017 at 12:37 PM, Guy Cukierman <guyc at elminda.com<mailto:guyc at elminda.com>> wrote:
Hi,
Below please find the reserved ports and log, thanks.
sysctl net.ipv4.ip_local_reserved_ports:
net.ipv4.ip_local_reserved_ports = 30000-32767
glusterd.log:
[2017-06-18 07:04:17.853162] I [MSGID: 106487] [glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 192.168.1.17 24007
[2017-06-18 07:04:17.853237] D [MSGID: 0] [common-utils.c:3361:gf_is_local_addr] 0-management: 192.168.1.17
[2017-06-18 07:04:17.854093] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk
The message "D [MSGID: 0] [common-utils.c:3361:gf_is_local_addr] 0-management: 192.168.1.17 " repeated 2 times between [2017-06-18 07:04:17.853237] and [2017-06-18 07:04:17.853869]
[2017-06-18 07:04:17.854093] D [MSGID: 0] [common-utils.c:3377:gf_is_local_addr] 0-management: 192.168.1.17 is not local
[2017-06-18 07:04:17.854221] D [MSGID: 0] [glusterd-peer-utils.c:132:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: 192.168.1.17
[2017-06-18 07:04:17.854271] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk
[2017-06-18 07:04:17.854269] D [MSGID: 0] [glusterd-peer-utils.c:132:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: 192.168.1.17
[2017-06-18 07:04:17.854271] D [MSGID: 0] [glusterd-peer-utils.c:246:glusterd_peerinfo_find] 0-management: Unable to find hostname: 192.168.1.17
[2017-06-18 07:04:17.854306] I [MSGID: 106129] [glusterd-handler.c:3690:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 192.168.1.17 (24007)
[2017-06-18 07:04:17.854343] D [MSGID: 0] [glusterd-peer-utils.c:486:glusterd_peer_hostname_new] 0-glusterd: Returning 0
[2017-06-18 07:04:17.854367] D [MSGID: 0] [glusterd-utils.c:7060:glusterd_sm_tr_log_init] 0-glusterd: returning 0
[2017-06-18 07:04:17.854387] D [MSGID: 0] [glusterd-store.c:4092:glusterd_store_create_peer_dir] 0-glusterd: Returning with 0
[2017-06-18 07:04:17.854918] D [MSGID: 0] [store.c:420:gf_store_handle_new] 0-: Returning 0
[2017-06-18 07:04:17.855083] D [MSGID: 0] [store.c:374:gf_store_save_value] 0-management: returning: 0
[2017-06-18 07:04:17.855130] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk
The message "D [MSGID: 0] [store.c:374:gf_store_save_value] 0-management: returning: 0" repeated 2 times between [2017-06-18 07:04:17.855083] and [2017-06-18 07:04:17.855128]
[2017-06-18 07:04:17.855129] D [MSGID: 0] [glusterd-store.c:4221:glusterd_store_peer_write] 0-glusterd: Returning with 0
[2017-06-18 07:04:17.856294] D [MSGID: 0] [glusterd-store.c:4247:glusterd_store_perform_peer_store] 0-glusterd: Returning 0
[2017-06-18 07:04:17.856332] D [MSGID: 0] [glusterd-store.c:4268:glusterd_store_peerinfo] 0-glusterd: Returning with 0
[2017-06-18 07:04:17.856365] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2017-06-18 07:04:17.856387] D [MSGID: 0] [glusterd-handler.c:3474:glusterd_transport_inet_options_build] 0-glusterd: Returning 0
[2017-06-18 07:04:17.856409] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-06-18 07:04:17.856421] D [rpc-clnt.c:1071:rpc_clnt_connection_init] 0-management: setting ping-timeout to 30
[2017-06-18 07:04:17.856434] D [rpc-transport.c:279:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so
[2017-06-18 07:04:17.856580] D [socket.c:4082:socket_init] 0-management: Configued transport.tcp-user-timeout=-1
[2017-06-18 07:04:17.856594] D [socket.c:4165:socket_init] 0-management: SSL support on the I/O path is NOT enabled
[2017-06-18 07:04:17.856625] D [socket.c:4168:socket_init] 0-management: SSL support for glusterd is NOT enabled
[2017-06-18 07:04:17.856634] D [socket.c:4185:socket_init] 0-management: using system polling thread
[2017-06-18 07:04:17.856664] D [name.c:168:client_fill_address_family] 0-management: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 192.168.1.17)
[2017-06-18 07:04:17.861800] D [MSGID: 0] [common-utils.c:334:gf_resolve_ip6] 0-resolver: returning ip-192.168.1.17 (port-24007) for hostname: 192.168.1.17 and port: 24007
[2017-06-18 07:04:17.861830] D [socket.c:2982:socket_fix_ssl_opts] 0-management: disabling SSL for portmapper connection
[2017-06-18 07:04:17.861885] D [MSGID: 0] [common-utils.c:3106:gf_ports_reserved] 0-glusterfs: lower: 30000, higher: 32767
[2017-06-18 07:04:17.861920] D [logging.c:1764:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
[2017-06-18 07:04:17.861933] D [logging.c:1767:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git<http://git.gluster.org/glusterfs.git>
signal received: 11
time of crash:
2017-06-18 07:04:17
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.10.3
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7fbdf7c964d0]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fbdf7c9fdd4]
/lib64/libc.so.6(+0x35250)[0x7fbdf637a250]
/lib64/libglusterfs.so.0(gf_ports_reserved+0x15c)[0x7fbdf7ca044c]
/lib64/libglusterfs.so.0(gf_process_reserved_ports+0xbe)[0x7fbdf7ca070e]
/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xd158)[0x7fbde9c24158]
/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(client_bind+0x93)[0x7fbde9c245a3]
/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xa875)[0x7fbde9c21875]
/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7fbdf7a5ff89]
/lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7fbdf7a60049]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24218)[0x7fbdec7b5218]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24843)[0x7fbdec7b5843]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24ae0)[0x7fbdec7b5ae0]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27890)[0x7fbdec7b8890]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27e20)[0x7fbdec7b8e20]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x20f5e)[0x7fbdec7b1f5e]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7fbdf7ccd750]
/lib64/libc.so.6(+0x46cf0)[0x7fbdf638bcf0]
---------
From: Gaurav Yadav [mailto:gyadav at redhat.com<mailto:gyadav at redhat.com>]
Sent: Friday, June 16, 2017 5:47 AM
To: Atin Mukherjee <amukherj at redhat.com<mailto:amukherj at redhat.com>>
Cc: Guy Cukierman <guyc at elminda.com<mailto:guyc at elminda.com>>; gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] gluster peer probe failing
Could you please send me the output of command "sysctl net.ipv4.ip_local_reserved_ports".
Apart from output of command please send the logs to look into the issue.
Thanks
Gaurav
On Thu, Jun 15, 2017 at 4:28 PM, Atin Mukherjee <amukherj at redhat.com<mailto:amukherj at redhat.com>> wrote:
+Gaurav, he is the author of the patch, can you please comment here?
On Thu, Jun 15, 2017 at 3:28 PM, Guy Cukierman <guyc at elminda.com<mailto:guyc at elminda.com>> wrote:
Thanks, but my current settings are:
net.ipv4.ip_local_reserved_ports = 30000-32767
net.ipv4.ip_local_port_range = 32768 60999
meaning the reserved ports are already in the short int range, so maybe I misunderstood something? or is it a different issue?
From: Atin Mukherjee [mailto:amukherj at redhat.com<mailto:amukherj at redhat.com>]
Sent: Thursday, June 15, 2017 10:56 AM
To: Guy Cukierman <guyc at elminda.com<mailto:guyc at elminda.com>>
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] gluster peer probe failing
https://review.gluster.org/#/c/17494/ will it and the next update of 3.10 should have this fix.
If sysctl net.ipv4.ip_local_reserved_ports has any value > short int range then this would be a problem with the current version.
Would you be able to reset the reserved ports temporarily to get this going?
On Wed, Jun 14, 2017 at 8:32 PM, Guy Cukierman <guyc at elminda.com<mailto:guyc at elminda.com>> wrote:
Hi,
I have a gluster (version 3.10.2) server running on a 3 node (centos7) cluster.
Firewalld and SELinux are disabled, and I see I can telnet from each node to the other on port 24007.
When I try to create the first peering by running on node1 the command:
gluster peer probe <node2 ip address>
I get the error:
“Connection failed. Please check if gluster daemon is operational.”
And Glusterd.log shows:
[2017-06-14 14:46:09.927510] I [MSGID: 106487] [glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 192.168.1.17 24007
[2017-06-14 14:46:09.928560] I [MSGID: 106129] [glusterd-handler.c:3690:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 192.168.1.17 (24007)
[2017-06-14 14:46:09.930783] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2017-06-14 14:46:09.930837] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git<http://git.gluster.org/glusterfs.git>
signal received: 11
time of crash:
2017-06-14 14:46:09
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.10.3
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f69625da4d0]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f69625e3dd4]
/lib64/libc.so.6(+0x35250)[0x7f6960cbe250]
/lib64/libglusterfs.so.0(gf_ports_reserved+0x15c)[0x7f69625e444c]
/lib64/libglusterfs.so.0(gf_process_reserved_ports+0xbe)[0x7f69625e470e]
/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xd158)[0x7f6954568158]
/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(client_bind+0x93)[0x7f69545685a3]
/usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xa875)[0x7f6954565875]
/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7f69623a3f89]
/lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7f69623a4049]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24218)[0x7f69570f9218]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24843)[0x7f69570f9843]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24ae0)[0x7f69570f9ae0]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27890)[0x7f69570fc890]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27e20)[0x7f69570fce20]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x20f5e)[0x7f69570f5f5e]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f6962611750]
/lib64/libc.so.6(+0x46cf0)[0x7f6960ccfcf0]
And a file is create under /var/lib/glusterd/peers/<node2 ip address> which contains:
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=192.168.1.17
and the glusterd daemon exits and I cannot restart it until I delete this file from the peers folder.
Any idea what is wrong?
thanks!
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/2f4dce8a/attachment.html>
More information about the Gluster-users
mailing list