[Bugs] [Bug 1459760] New: Glusterd segmentation fault in ' _Unwind_Backtrace' while running peer probe

Thu Jun 8 06:32:01 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1459760

            Bug ID: 1459760
           Summary: Glusterd segmentation fault in ' _Unwind_Backtrace'
                    while running peer probe
           Product: GlusterFS
           Version: 3.10
         Component: glusterd
          Keywords: Triaged
          Severity: urgent
          Priority: medium
          Assignee: bugs at gluster.org
          Reporter: gyadav at redhat.com
                CC: amukherj at redhat.com, anraj at redhat.com, ben at apcera.com,
                    bugs at gluster.org, earl at ruby.org, gyadav at redhat.com,
                    kaushal at redhat.com, kkeithle at redhat.com,
                    ppai at redhat.com, sbairagy at redhat.com,
                    skoduri at redhat.com
        Depends On: 1447523
            Blocks: 1454418, 1459759

Created attachment 1285987
  --> https://bugzilla.redhat.com/attachment.cgi?id=1285987&action=edit
Comments cannot be longer than 65535 characters, hence attaching

+++ This bug was initially created as a clone of Bug #1447523 +++

Description of problem:

ssuing a peer probe results in a glusterd segmentation fault. Once in this
state, if the peer is removed from /var/lib/glusterd/peers, glusterd will
start.  Probing a peer again leads to the same problem.

Problematic peer entry:
cat /var/lib/glusterd/peers/ip-10-0-50-25.us-west-1.compute.internal 
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=ip-10-0-50-25.us-west-1.compute.internal

Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level=TRACE --log-buf-size=0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50,
context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
58      ./md-unwind-support.h: No such file or directory.
(gdb) bt
#0  x86_64_fallback_frame_state (context=0x7ffe5d9a3b50,
context=0x7ffe5d9a3b50, fs=0x7ffe5d9a3c40) at ./md-unwind-support.h:58
#1  uw_frame_state_for (context=context at entry=0x7ffe5d9a3b50,
fs=fs at entry=0x7ffe5d9a3c40) at ../../../src/libgcc/unwind-dw2.c:1253
#2  0x00007f6371b2f6d8 in _Unwind_Backtrace (trace=0x7f6378bc2440
<backtrace_helper>, trace_argument=0x7ffe5d9a3e00) at
../../../src/libgcc/unwind.inc:290
#3  0x00007f6378bc25b6 in __GI___backtrace (array=array at entry=0x7ffe5d9a3e40,
size=size at entry=200) at ../sysdeps/x86_64/backtrace.c:109
#4  0x00007f63796f3f42 in _gf_msg_backtrace_nomem
(level=level at entry=GF_LOG_ALERT, stacksize=stacksize at entry=200) at
logging.c:1094
#5  0x00007f63796fd494 in gf_print_trace (signum=11, ctx=0x7f637a3ac010) at
common-utils.c:737
#6  <signal handler called>
#7  0x00000001725cc6c8 in ?? ()
#8  0x0000000000000000 in ?? ()

Version-Release number of selected component (if applicable):

$ glusterd --version 
glusterfs 3.8.11

from package glusterfs-server 3.8.11-ubuntu1~trusty1

How reproducible:

1:1

Steps to Reproduce:
1. Install gluster on Ubuntu 14.04
2. sudo /usr/sbin/gluster --log-level=TRACE peer probe
ip-10-0-50-25.us-west-1.compute.internal
Connection failed. Please check if gluster daemon is operational.

Actual results:

Glusterd crashes on peer probe.

Expected results:

Glusterd should not crash on peer probe.

Additional info:

There's another issue which may be related. I noticed that glusterd.info was
not self-populating. As a workaround I issue 'gluster pool list' which triggers
glusterd to generate and store a UUID:

cat /var/lib/glusterd/glusterd.info 
UUID=ad7b8337-ec4d-4917-ad6b-ca0e4d0eba42
operating-version=30800

This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1293594

Gaurav,

I can grant you access to EC2 instances that are in this state. Is that
acceptable? If so, please send me your SSH public key.

Please look at https://bugzilla.redhat.com/attachment.cgi?id=1276539 ? Check
out Stacktrace, StacktraceSource, and ThreadStacktrace.

--- Additional comment from Kaushal on 2017-05-15 03:46:47 EDT ---

To make it easier to debug, please install the `glusterfs-dbg` package, which
should provide better information in the backtraces. Also, try to start
glusterd with debug logs, either directly by running `glusterd -LDEBUG` or by
modifying the init script.

Doing the above should help get better logs and stacktraces, which will help
you get to the cause faster.

--- Additional comment from Ben Werthmann on 2017-05-15 10:14:04 EDT ---

Kaushal,

'glusterfs-dbg' is already installed and I've already modified the init scripts
(upstart job in this case) to use DEBUG level logging.

--- Additional comment from Gaurav Yadav on 2017-05-17 01:32:38 EDT ---

Ben,

Logs attached by you doesn't help much. I am not able to see the proper
backtrace.

In order to do RCA I need either reproducer or your host.

Here is my SSH public key

ssh-rsa 
ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDWFZqzFVo7orVZx2ODZyok46VI6EqLg16uP2Z1pkMrEQGu50i3Ye16V5I63UMrHjDwdr4hxtvkW9UfhckQpgBwjsVg9xoyl9tuYt1h9au8G0hH2UL1XYWmbQt82N9VbeYGStg3n0VoefHNZ4LH/VINg0gBWtIK7iTQxWR6XOvs2QqOJnUnM+Fgu5b9kS9vPoDr93BxGLya2ijASkRxsi5dUN4qm7LgFX7Hsyh14G+BBouF5wDZ6frR/UPpqocBVJ5/n4f9OkhwMOShlkWm0m/JDcu6L0phL+Dqm9KxPHBEA/PFW3atjvJW70Iun+j1i72SCcMccQjHSPB6J5QYSeQb
gyadav at dhcp35-39.lab.eng.blr.redhat.com

--- Additional comment from Ben Werthmann on 2017-05-17 12:09:50 EDT ---

Gaurav,

I've provided the connection info to you in a direct email.

--- Additional comment from Ben Werthmann on 2017-05-19 20:37:46 EDT ---

Running this command before peer probe reproduces this problem (leading to the
backtrace handler problem) in all cases:

sysctl net.ipv4.ip_local_reserved_ports="49152-49156"

or

sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"

The issue appears to be with parsing the contents of
'/proc/sys/net/ipv4/ip_local_reserved_ports' here:

https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-utils.c#L3038

This option appears to defer to the kernel for source port selection. Is there
a known issue with kernel port selection?

https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

I'm going to build and test with the above configure option.

--- Additional comment from Ben Werthmann on 2017-05-22 11:20:27 EDT ---

(In reply to Ben Werthmann from comment #22)
> Running this command before peer probe reproduces this problem (leading to
> the backtrace handler problem) in all cases:
> 
> sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
> 
> or
> 
> sysctl net.ipv4.ip_local_reserved_ports="24007-24008,32765-32768,49152-49156"
> 
> The issue appears to be with parsing the contents of
> '/proc/sys/net/ipv4/ip_local_reserved_ports' here:
> 
> https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/common-
> utils.c#L3038
> 
> This option appears to defer to the kernel for source port selection. Is
> there a known issue with kernel port selection?
> 
> https://github.com/gluster/glusterfs/blob/master/configure.ac#L312-L320

This option is not in 3.8.

> 
> I'm going to build and test with the above configure option.

--- Additional comment from Gaurav Yadav on 2017-05-22 12:34:54 EDT ---

Thanks Ben for providing the additional info, It helped me in finding the root
cause of the issue.
While parsing the ports we are not handling MIN MAX range properly hence
glusterd is crashing.

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1447523
[Bug 1447523] Glusterd segmentation fault in ' _Unwind_Backtrace' while
running peer probe
https://bugzilla.redhat.com/show_bug.cgi?id=1454418
[Bug 1454418] Glusterd segmentation fault in ' _Unwind_Backtrace' while
running peer probe
https://bugzilla.redhat.com/show_bug.cgi?id=1459759
[Bug 1459759] Glusterd segmentation fault in ' _Unwind_Backtrace' while
running peer probe
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.