[Gluster-users] gluster peer probe failing

Tue Jun 20 06:46:18 UTC 2017

Hi,

I am able to recreate the issue and here is my RCA.
Maximum value i.e 32767 is being overflowed while doing manipulation on it
and it was previously not taken care properly.
Hence glusterd was crashing with SIGSEGV.

Issue is being fixed with "
https://bugzilla.redhat.com/show_bug.cgi?id=1454418" and being backported
as well.

Thanks
Gaurav

On Tue, Jun 20, 2017 at 6:43 AM, Gaurav Yadav <gyadav at redhat.com> wrote:

> Hi,
>
> I have tried on my host by setting corresponding ports, but I didn't see
> the issue on my machine locally.
> However with the logs you have sent it is prety much clear issue is
> related to ports only.
>
> I will trying to reproduce on some other machine. Will update you as s0on
> as possible.
>
>
> Thanks
> Gaurav
>
> On Sun, Jun 18, 2017 at 12:37 PM, Guy Cukierman <guyc at elminda.com> wrote:
>
>> Hi,
>>
>> Below please find the reserved ports and log, thanks.
>>
>>
>>
>> sysctl net.ipv4.ip_local_reserved_ports:
>>
>> net.ipv4.ip_local_reserved_ports = 30000-32767
>>
>>
>>
>>
>>
>> glusterd.log:
>>
>> [2017-06-18 07:04:17.853162] I [MSGID: 106487]
>> [glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd:
>> Received CLI probe req 192.168.1.17 24007
>>
>> [2017-06-18 07:04:17.853237] D [MSGID: 0] [common-utils.c:3361:gf_is_local_addr]
>> 0-management: 192.168.1.17
>>
>> [2017-06-18 07:04:17.854093] D [logging.c:1952:_gf_msg_internal]
>> 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About
>> to flush least recently used log message to disk
>>
>> The message "D [MSGID: 0] [common-utils.c:3361:gf_is_local_addr]
>> 0-management: 192.168.1.17 " repeated 2 times between [2017-06-18
>> 07:04:17.853237] and [2017-06-18 07:04:17.853869]
>>
>> [2017-06-18 07:04:17.854093] D [MSGID: 0] [common-utils.c:3377:gf_is_local_addr]
>> 0-management: 192.168.1.17 is not local
>>
>> [2017-06-18 07:04:17.854221] D [MSGID: 0] [glusterd-peer-utils.c:132:glu
>> sterd_peerinfo_find_by_hostname] 0-management: Unable to find friend:
>> 192.168.1.17
>>
>> [2017-06-18 07:04:17.854271] D [logging.c:1952:_gf_msg_internal]
>> 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About
>> to flush least recently used log message to disk
>>
>> [2017-06-18 07:04:17.854269] D [MSGID: 0] [glusterd-peer-utils.c:132:glu
>> sterd_peerinfo_find_by_hostname] 0-management: Unable to find friend:
>> 192.168.1.17
>>
>> [2017-06-18 07:04:17.854271] D [MSGID: 0] [glusterd-peer-utils.c:246:glusterd_peerinfo_find]
>> 0-management: Unable to find hostname: 192.168.1.17
>>
>> [2017-06-18 07:04:17.854306] I [MSGID: 106129]
>> [glusterd-handler.c:3690:glusterd_probe_begin] 0-glusterd: Unable to
>> find peerinfo for host: 192.168.1.17 (24007)
>>
>> [2017-06-18 07:04:17.854343] D [MSGID: 0] [glusterd-peer-utils.c:486:glusterd_peer_hostname_new]
>> 0-glusterd: Returning 0
>>
>> [2017-06-18 07:04:17.854367] D [MSGID: 0] [glusterd-utils.c:7060:glusterd_sm_tr_log_init]
>> 0-glusterd: returning 0
>>
>> [2017-06-18 07:04:17.854387] D [MSGID: 0] [glusterd-store.c:4092:glusterd_store_create_peer_dir]
>> 0-glusterd: Returning with 0
>>
>> [2017-06-18 07:04:17.854918] D [MSGID: 0] [store.c:420:gf_store_handle_new]
>> 0-: Returning 0
>>
>> [2017-06-18 07:04:17.855083] D [MSGID: 0] [store.c:374:gf_store_save_value]
>> 0-management: returning: 0
>>
>> [2017-06-18 07:04:17.855130] D [logging.c:1952:_gf_msg_internal]
>> 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About
>> to flush least recently used log message to disk
>>
>> The message "D [MSGID: 0] [store.c:374:gf_store_save_value]
>> 0-management: returning: 0" repeated 2 times between [2017-06-18
>> 07:04:17.855083] and [2017-06-18 07:04:17.855128]
>>
>> [2017-06-18 07:04:17.855129] D [MSGID: 0] [glusterd-store.c:4221:glusterd_store_peer_write]
>> 0-glusterd: Returning with 0
>>
>> [2017-06-18 07:04:17.856294] D [MSGID: 0] [glusterd-store.c:4247:glusterd_store_perform_peer_store]
>> 0-glusterd: Returning 0
>>
>> [2017-06-18 07:04:17.856332] D [MSGID: 0] [glusterd-store.c:4268:glusterd_store_peerinfo]
>> 0-glusterd: Returning with 0
>>
>> [2017-06-18 07:04:17.856365] W [MSGID: 106062]
>> [glusterd-handler.c:3466:glusterd_transport_inet_options_build]
>> 0-glusterd: Failed to get tcp-user-timeout
>>
>> [2017-06-18 07:04:17.856387] D [MSGID: 0] [glusterd-handler.c:3474:glust
>> erd_transport_inet_options_build] 0-glusterd: Returning 0
>>
>> [2017-06-18 07:04:17.856409] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>>
>> [2017-06-18 07:04:17.856421] D [rpc-clnt.c:1071:rpc_clnt_connection_init]
>> 0-management: setting ping-timeout to 30
>>
>> [2017-06-18 07:04:17.856434] D [rpc-transport.c:279:rpc_transport_load]
>> 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.10.3/rp
>> c-transport/socket.so
>>
>> [2017-06-18 07:04:17.856580] D [socket.c:4082:socket_init] 0-management:
>> Configued transport.tcp-user-timeout=-1
>>
>> [2017-06-18 07:04:17.856594] D [socket.c:4165:socket_init] 0-management:
>> SSL support on the I/O path is NOT enabled
>>
>> [2017-06-18 07:04:17.856625] D [socket.c:4168:socket_init] 0-management:
>> SSL support for glusterd is NOT enabled
>>
>> [2017-06-18 07:04:17.856634] D [socket.c:4185:socket_init] 0-management:
>> using system polling thread
>>
>> [2017-06-18 07:04:17.856664] D [name.c:168:client_fill_address_family]
>> 0-management: address-family not specified, marking it as unspec for
>> getaddrinfo to resolve from (remote-host: 192.168.1.17)
>>
>> [2017-06-18 07:04:17.861800] D [MSGID: 0] [common-utils.c:334:gf_resolve_ip6]
>> 0-resolver: returning ip-192.168.1.17 (port-24007) for hostname:
>> 192.168.1.17 and port: 24007
>>
>> [2017-06-18 07:04:17.861830] D [socket.c:2982:socket_fix_ssl_opts]
>> 0-management: disabling SSL for portmapper connection
>>
>> [2017-06-18 07:04:17.861885] D [MSGID: 0] [common-utils.c:3106:gf_ports_reserved]
>> 0-glusterfs: lower: 30000, higher: 32767
>>
>> [2017-06-18 07:04:17.861920] D [logging.c:1764:gf_log_flush_extra_msgs]
>> 0-logging-infra: Log buffer size reduced. About to flush 5 extra log
>> messages
>>
>> [2017-06-18 07:04:17.861933] D [logging.c:1767:gf_log_flush_extra_msgs]
>> 0-logging-infra: Just flushed 5 extra log messages
>>
>> pending frames:
>>
>> frame : type(0) op(0)
>>
>> patchset: git://git.gluster.org/glusterfs.git
>>
>> signal received: 11
>>
>> time of crash:
>>
>> 2017-06-18 07:04:17
>>
>> configuration details:
>>
>> argp 1
>>
>> backtrace 1
>>
>> dlfcn 1
>>
>> libpthread 1
>>
>> llistxattr 1
>>
>> setfsid 1
>>
>> spinlock 1
>>
>> epoll.h 1
>>
>> xattr.h 1
>>
>> st_atim.tv_nsec 1
>>
>> package-string: glusterfs 3.10.3
>>
>> /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7fbdf7c964d0]
>>
>> /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fbdf7c9fdd4]
>>
>> /lib64/libc.so.6(+0x35250)[0x7fbdf637a250]
>>
>> /lib64/libglusterfs.so.0(gf_ports_reserved+0x15c)[0x7fbdf7ca044c]
>>
>> /lib64/libglusterfs.so.0(gf_process_reserved_ports+0xbe)[0x7fbdf7ca070e]
>>
>> /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xd158)
>> [0x7fbde9c24158]
>>
>> /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(client_
>> bind+0x93)[0x7fbde9c245a3]
>>
>> /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xa875)
>> [0x7fbde9c21875]
>>
>> /lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7fbdf7a5ff89]
>>
>> /lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7fbdf7a60049]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24218
>> )[0x7fbdec7b5218]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24843
>> )[0x7fbdec7b5843]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24ae0
>> )[0x7fbdec7b5ae0]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27890
>> )[0x7fbdec7b8890]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27e20
>> )[0x7fbdec7b8e20]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x20f5e
>> )[0x7fbdec7b1f5e]
>>
>> /lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7fbdf7ccd750]
>>
>> /lib64/libc.so.6(+0x46cf0)[0x7fbdf638bcf0]
>>
>> ---------
>>
>>
>>
>> *From:* Gaurav Yadav [mailto:gyadav at redhat.com]
>> *Sent:* Friday, June 16, 2017 5:47 AM
>> *To:* Atin Mukherjee <amukherj at redhat.com>
>> *Cc:* Guy Cukierman <guyc at elminda.com>; gluster-users at gluster.org
>>
>> *Subject:* Re: [Gluster-users] gluster peer probe failing
>>
>>
>>
>>
>>
>> Could you please send me the output of command "sysctl
>> net.ipv4.ip_local_reserved_ports".
>>
>> Apart from output of command please send the logs to look into the issue.
>>
>> Thanks
>>
>> Gaurav
>>
>>
>>
>>
>>
>> On Thu, Jun 15, 2017 at 4:28 PM, Atin Mukherjee <amukherj at redhat.com>
>> wrote:
>>
>> +Gaurav, he is the author of the patch, can you please comment here?
>>
>>
>>
>> On Thu, Jun 15, 2017 at 3:28 PM, Guy Cukierman <guyc at elminda.com> wrote:
>>
>> Thanks, but my current settings are:
>>
>> net.ipv4.ip_local_reserved_ports = 30000-32767
>>
>> net.ipv4.ip_local_port_range = 32768    60999
>>
>> meaning the reserved ports are already in the short int range, so maybe I
>> misunderstood something? or is it a different issue?
>>
>>
>>
>> *From:* Atin Mukherjee [mailto:amukherj at redhat.com]
>> *Sent:* Thursday, June 15, 2017 10:56 AM
>> *To:* Guy Cukierman <guyc at elminda.com>
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] gluster peer probe failing
>>
>>
>>
>> https://review.gluster.org/#/c/17494/ will it and the next update of
>> 3.10 should have this fix.
>>
>> If sysctl net.ipv4.ip_local_reserved_ports has any value > short int range then this would be a problem with the current version.
>> Would you be able to reset the reserved ports temporarily to get this going?
>>
>>
>>
>>
>> On Wed, Jun 14, 2017 at 8:32 PM, Guy Cukierman <guyc at elminda.com> wrote:
>>
>> Hi,
>>
>> I have a gluster (version 3.10.2) server running on a 3 node (centos7)
>> cluster.
>>
>> Firewalld and SELinux are disabled, and I see I can telnet from each node
>> to the other on port 24007.
>>
>>
>>
>> When I try to create the first peering by running on node1 the command:
>>
>> gluster peer probe <node2 ip address>
>>
>>
>>
>> I get the error:
>>
>> “Connection failed. Please check if gluster daemon is operational.”
>>
>>
>>
>> And Glusterd.log shows:
>>
>>
>>
>> [2017-06-14 14:46:09.927510] I [MSGID: 106487]
>> [glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd:
>> Received CLI probe req 192.168.1.17 24007
>>
>> [2017-06-14 14:46:09.928560] I [MSGID: 106129]
>> [glusterd-handler.c:3690:glusterd_probe_begin] 0-glusterd: Unable to
>> find peerinfo for host: 192.168.1.17 (24007)
>>
>> [2017-06-14 14:46:09.930783] W [MSGID: 106062]
>> [glusterd-handler.c:3466:glusterd_transport_inet_options_build]
>> 0-glusterd: Failed to get tcp-user-timeout
>>
>> [2017-06-14 14:46:09.930837] I [rpc-clnt.c:1059:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>>
>> pending frames:
>>
>> frame : type(0) op(0)
>>
>> patchset: git://git.gluster.org/glusterfs.git
>>
>> signal received: 11
>>
>> time of crash:
>>
>> 2017-06-14 14:46:09
>>
>> configuration details:
>>
>> argp 1
>>
>> backtrace 1
>>
>> dlfcn 1
>>
>> libpthread 1
>>
>> llistxattr 1
>>
>> setfsid 1
>>
>> spinlock 1
>>
>> epoll.h 1
>>
>> xattr.h 1
>>
>> st_atim.tv_nsec 1
>>
>> package-string: glusterfs 3.10.3
>>
>> /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f69625da4d0]
>>
>> /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f69625e3dd4]
>>
>> /lib64/libc.so.6(+0x35250)[0x7f6960cbe250]
>>
>> /lib64/libglusterfs.so.0(gf_ports_reserved+0x15c)[0x7f69625e444c]
>>
>> /lib64/libglusterfs.so.0(gf_process_reserved_ports+0xbe)[0x7f69625e470e]
>>
>> /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xd158)
>> [0x7f6954568158]
>>
>> /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(client_
>> bind+0x93)[0x7f69545685a3]
>>
>> /usr/lib64/glusterfs/3.10.3/rpc-transport/socket.so(+0xa875)
>> [0x7f6954565875]
>>
>> /lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xc9)[0x7f69623a3f89]
>>
>> /lib64/libgfrpc.so.0(rpc_clnt_start+0x39)[0x7f69623a4049]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24218
>> )[0x7f69570f9218]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24843
>> )[0x7f69570f9843]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x24ae0
>> )[0x7f69570f9ae0]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27890
>> )[0x7f69570fc890]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x27e20
>> )[0x7f69570fce20]
>>
>> /usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x20f5e
>> )[0x7f69570f5f5e]
>>
>> /lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f6962611750]
>>
>> /lib64/libc.so.6(+0x46cf0)[0x7f6960ccfcf0]
>>
>>
>>
>> And a file is create under /var/lib/glusterd/peers/<node2 ip address>
>> which contains:
>>
>> uuid=00000000-0000-0000-0000-000000000000
>>
>> state=0
>>
>> hostname1=192.168.1.17
>>
>>
>>
>> and the glusterd daemon exits and I cannot restart it until I delete this
>> file from the peers folder.
>>
>>
>>
>> Any idea what is wrong?
>>
>> thanks!
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/c78b8bb7/attachment.html>