[Bugs] [Bug 1233044] New: [geo-rep]: Segmentation faults are observed on all the master nodes

bugzilla at redhat.com bugzilla at redhat.com
Thu Jun 18 06:51:00 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1233044

            Bug ID: 1233044
           Summary: [geo-rep]: Segmentation faults are observed on all the
                    master nodes
           Product: GlusterFS
           Version: 3.7.1
         Component: geo-replication
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: aavati at redhat.com, bugs at gluster.org, csaba at redhat.com,
                    gluster-bugs at redhat.com, khiremat at redhat.com,
                    nlevinki at redhat.com, rhinduja at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1232609, 1232666



+++ This bug was initially created as a clone of Bug #1232666 +++

+++ This bug was initially created as a clone of Bug #1232609 +++

Description of problem:
=======================

Ran basic geo-rep cases with changelog,xsync and history crawl. Found the cores
on all the master nodes.

Master Node:1
=============
[root at rhsqe-vm01 ~]# ls -lrt /core*
-rw-------. 1 root root 125153280 Jun 16 23:14 /core.16155
-rw-------. 1 root root 133541888 Jun 17 00:57 /core.9695
-rw-------. 1 root root 132493312 Jun 17 02:46 /core.14005
-rw-------. 1 root root 133541888 Jun 17 02:59 /core.8089
-rw-------. 1 root root 133541888 Jun 17 04:04 /core.27626
-rw-------. 1 root root 132493312 Jun 17 07:55 /core.16584
-rw-------. 1 root root 132513792 Jun 17 09:25 /core.29550
-rw-------. 1 root root 123850752 Jun 17 12:07 /core.26792
-rw-------. 1 root root 124919808 Jun 17 13:23 /core.3604
-rw-------. 1 root root 127275008 Jun 17 14:39 /core.22976
-rw-------. 1 root root 133566464 Jun 17 15:13 /core.25537
-rw-------. 1 root root 131469312 Jun 17 15:45 /core.1220
[root at rhsqe-vm01 ~]# 

[root at rhsqe-vm01 ~]# file /core*
/core.1220:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.14005: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.16155: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.16584: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.22976: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.25537: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.26792: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.27626: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.29550: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.3604:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.8089:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.9695:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
[root at rhsqe-vm01 ~]# 

[New LWP 1271]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `python
/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick0'.
Program terminated with signal 11, Segmentation fault.
#0  __GI___pthread_mutex_lock (mutex=mutex at entry=0x0) at
pthread_mutex_lock.c:50
50      unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.12.2-14.el7.x86_64
libcom_err-1.42.9-7.el7.x86_64 libffi-3.0.13-11.el7.x86_64
libselinux-2.2.2-6.el7.x86_64 libuuid-2.23.2-21.el7.x86_64
openssl-libs-1.0.1e-42.el7.x86_64 pcre-8.32-14.el7.x86_64
xz-libs-5.1.2-9alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=mutex at entry=0x0) at
pthread_mutex_lock.c:50
#1  0x00007fd71cbfa6f8 in gf_changelog_process (data=0x7fd7140589a0)
    at gf-changelog-journal-handler.c:649
#2  0x00007fd72ae8adf5 in start_thread (arg=0x7fd6feffd700) at
pthread_create.c:308
#3  0x00007fd72a4af1ad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 



Master Node:2
=============

[root at rhsqe-vm02 ~]# ls -lrt /core*
-rw-------. 1 root root 123850752 Jun 16 23:16 /core.14536
-rw-------. 1 root root 133541888 Jun 17 00:17 /core.19738
-rw-------. 1 root root 125153280 Jun 17 00:44 /core.30244
-rw-------. 1 root root 133562368 Jun 17 01:39 /core.20706
-rw-------. 1 root root 124919808 Jun 17 02:29 /core.5475
-rw-------. 1 root root 131444736 Jun 17 02:47 /core.6491
-rw-------. 1 root root 132493312 Jun 17 03:55 /core.26122
-rw-------. 1 root root 124952576 Jun 17 04:26 /core.28572
-rw-------. 1 root root 131469312 Jun 17 05:41 /core.1853
-rw-------. 1 root root 133541888 Jun 17 08:39 /core.19311
-rw-------. 1 root root 133562368 Jun 17 10:24 /core.29696
-rw-------. 1 root root 123871232 Jun 17 10:56 /core.14069
-rw-------. 1 root root 123056128 Jun 17 11:46 /core.14790
-rw-------. 1 root root 125173760 Jun 17 13:01 /core.23855
-rw-------. 1 root root 124899328 Jun 17 15:51 /core.3993
-rw-------. 1 root root 118661120 Jun 17 16:31 /core.21503
-rw-------. 1 root root 123904000 Jun 17 17:48 /core.31272
[root at rhsqe-vm02 ~]# file /core.*
/core.14069: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.14536: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.14790: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.1853:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.19311: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.19738: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.20706: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.21503: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.23855: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.26122: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.28572: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.29696: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.30244: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.31272: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.3993:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.5475:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
/core.6491:  ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style,
from 'python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py
--path=/bricks/brick0'
[root at rhsqe-vm02 ~]# 
[root at rhsqe-vm02 ~]# gdb python /core.31272
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/python2.7...Reading symbols from
/usr/bin/python2.7...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 31324]
[New LWP 31272]
[New LWP 31282]
[New LWP 31284]
[New LWP 31318]
[New LWP 31285]
[New LWP 31316]
[New LWP 31317]
[New LWP 31320]
[New LWP 31321]
[New LWP 31319]
[New LWP 31322]
[New LWP 31323]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `python
/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/bricks/brick0'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000000000 in ?? ()
Missing separate debuginfos, use: debuginfo-install python-2.7.5-16.el7.x86_64
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f23e016187c in gf_changelog_callback_invoker (arg=0x7f23cc0587e0)
    at gf-changelog-reborp.c:293
#2  0x00007f23ed3ecdf5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f23eca111ad in clone () from /lib64/libc.so.6
(gdb) quit
[root at rhsqe-vm02 ~]#

--- Additional comment from Anand Avati on 2015-06-17 05:37:32 EDT ---

REVIEW: http://review.gluster.org/11273 (libgfchangelog: Fix crash in
gf_changelog_process) posted (#1) for review on master by Kotresh HR
(khiremat at redhat.com)

--- Additional comment from Anand Avati on 2015-06-17 13:59:14 EDT ---

REVIEW: http://review.gluster.org/11273 (libgfchangelog: Fix crash in
gf_changelog_process) posted (#2) for review on master by Kotresh HR
(khiremat at redhat.com)

--- Additional comment from Anand Avati on 2015-06-18 02:34:15 EDT ---

COMMIT: http://review.gluster.org/11273 committed in master by Venky Shankar
(vshankar at redhat.com) 
------
commit ba7d5d914b2c897aef0616f3d95beb4d17bc51a8
Author: Kotresh HR <khiremat at redhat.com>
Date:   Wed Jun 17 14:39:26 2015 +0530

    libgfchangelog: Fix crash in gf_changelog_process

    Problem:
        Crash observed in gf_changelog_process and
        gf_changelog_callback_invoker.

    Cause:
        Assignments to arguments passed to thread is done
        post thread creation. If the thread created gets
        scheduled before the assignment and access these
        variables, it would crash with segmentation fault.

    Solution:
        Assignments to arguments are done prior to the thread
        creation.

    Change-Id: I6afc8ccedd050cf4b50b967fef8287a0c834177b
    BUG: 1232666
    Signed-off-by: Kotresh HR <khiremat at redhat.com>
    Reviewed-on: http://review.gluster.org/11273
    Tested-by: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Venky Shankar <vshankar at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1232609
[Bug 1232609] [geo-rep]: RHEL7.1 segmentation faults are observed on all
the master nodes
https://bugzilla.redhat.com/show_bug.cgi?id=1232666
[Bug 1232666] [geo-rep]: Segmentation faults are observed on all the master
nodes
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list