[Bugs] [Bug 1239156] New: Glusterd crashed

bugzilla at redhat.com bugzilla at redhat.com
Fri Jul 3 19:24:03 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1239156

            Bug ID: 1239156
           Summary: Glusterd crashed
           Product: GlusterFS
           Version: mainline
         Component: glusterd
          Keywords: ZStream
          Assignee: bugs at gluster.org
          Reporter: anekkunt at redhat.com
                CC: amukherj at redhat.com, anekkunt at redhat.com,
                    bugs at gluster.org, byarlaga at redhat.com,
                    gluster-bugs at redhat.com, sdharane at redhat.com
        Depends On: 1238067
            Blocks: 1223636



+++ This bug was initially created as a clone of Bug #1238067 +++

Description of problem:
=======================

Seen a glusterd crash. No restarts were done and there IO running from  the
client. Did a peer probe to another server which failed with 107 error.

Backtrace:
=========
(gdb) bt
#0  _rcu_read_lock_bp () at urcu/static/urcu-bp.h:199
#1  rcu_read_lock_bp () at urcu-bp.c:271
#2  0x00007fb1d33cd256 in __glusterd_peer_rpc_notify (rpc=0x7fb1df49c8d0, 
    mydata=<value optimized out>, event=RPC_CLNT_DISCONNECT, 
    data=<value optimized out>) at glusterd-handler.c:4996
#3  0x00007fb1d33b0c50 in glusterd_big_locked_notify (rpc=0x7fb1df49c8d0, 
    mydata=0x7fb1df49c250, event=RPC_CLNT_DISCONNECT, data=0x0, 
    notify_fn=0x7fb1d33cd1f0 <__glusterd_peer_rpc_notify>)
    at glusterd-handler.c:71
#4  0x00007fb1de793953 in rpc_clnt_notify (trans=<value optimized out>, 
    mydata=0x7fb1df49c900, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:861
#5  0x00007fb1de78ead8 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#6  0x00007fb1d1a53df1 in socket_event_poll_err (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7fb1df49fa60, 
    poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:1205
#7  socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7fb1df49fa60, 
    poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:2410
#8  0x00007fb1dea27970 in event_dispatch_epoll_handler (data=0x7fb1df4edda0)
    at event-epoll.c:575
#9  event_dispatch_epoll_worker (data=0x7fb1df4edda0) at event-epoll.c:678
#10 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fb1dd41896d in clone () from /lib64/libc.so.6
(gdb)

(gdb) t a a bt

Thread 7 (Thread 0x7fb1cd6a7700 (LWP 10080)):
#0  0x00007fb1ddab2a0e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007fb1dea0acab in syncenv_task (proc=0x7fb1df34bb00) at syncop.c:595
#2  0x00007fb1dea0fba0 in syncenv_processor (thdata=0x7fb1df34bb00)
    at syncop.c:687
#3  0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fb1dd41896d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7fb1d5f03700 (LWP 2914)):
#0  0x00007fb1ddab5fbd in nanosleep () from /lib64/libpthread.so.0
#1  0x00007fb1de9e55ca in gf_timer_proc (ctx=0x7fb1df31d010) at timer.c:205
#2  0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fb1dd41896d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fb1d4100700 (LWP 2917)):
#0  0x00007fb1ddab2a0e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007fb1dea0acab in syncenv_task (proc=0x7fb1df34afc0) at syncop.c:595
#2  0x00007fb1dea0fba0 in syncenv_processor (thdata=0x7fb1df34afc0)
    at syncop.c:687
#3  0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fb1dd41896d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7fb1d02c3700 (LWP 3091)):
#0  0x00007fb1ddab263c in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00007fb1d3465973 in hooks_worker (args=<value optimized out>)
    at glusterd-hooks.c:534
#2  0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fb1dd41896d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fb1dee72740 (LWP 2913)):
#0  0x00007fb1ddaaf2ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fb1dea2741d in event_dispatch_epoll (event_pool=0x7fb1df33bc90)
    at event-epoll.c:762
#2  0x00007fb1dee8eef1 in main (argc=2, argv=0x7ffdfed58a08)
    at glusterfsd.c:2333

Thread 2 (Thread 0x7fb1d5502700 (LWP 2915)):
#0  0x00007fb1d2c1cf18 in _fini () from /usr/lib64/liburcu-cds.so.1.0.0
#1  0x00007fb1dec72c7c in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#2  0x00007fb1dd365b22 in exit () from /lib64/libc.so.6
#3  0x00007fb1dee8cc03 in cleanup_and_exit (signum=<value optimized out>)
    at glusterfsd.c:1276
#4  0x00007fb1dee8d075 in glusterfs_sigwaiter (arg=<value optimized out>)
    at glusterfsd.c:1997
#5  0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0
#6  0x00007fb1dd41896d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fb1cf8c2700 (LWP 3092)):
#0  _rcu_read_lock_bp () at urcu/static/urcu-bp.h:199
#1  rcu_read_lock_bp () at urcu-bp.c:271
#2  0x00007fb1d33cd256 in __glusterd_peer_rpc_notify (rpc=0x7fb1df49c8d0, 
    mydata=<value optimized out>, event=RPC_CLNT_DISCONNECT, 
    data=<value optimized out>) at glusterd-handler.c:4996
#3  0x00007fb1d33b0c50 in glusterd_big_locked_notify (rpc=0x7fb1df49c8d0, 
    mydata=0x7fb1df49c250, event=RPC_CLNT_DISCONNECT, data=0x0, 
---Type <return> to continue, or q <return> to quit---
    notify_fn=0x7fb1d33cd1f0 <__glusterd_peer_rpc_notify>)
    at glusterd-handler.c:71
#4  0x00007fb1de793953 in rpc_clnt_notify (trans=<value optimized out>, 
    mydata=0x7fb1df49c900, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:861
#5  0x00007fb1de78ead8 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#6  0x00007fb1d1a53df1 in socket_event_poll_err (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7fb1df49fa60, 
    poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:1205
#7  socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7fb1df49fa60, 
    poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:2410
#8  0x00007fb1dea27970 in event_dispatch_epoll_handler (data=0x7fb1df4edda0)
    at event-epoll.c:575
#9  event_dispatch_epoll_worker (data=0x7fb1df4edda0) at event-epoll.c:678
#10 0x00007fb1ddaaea51 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fb1dd41896d in clone () from /lib64/libc.so.6
(gdb) 
(gdb) 


Version-Release number of selected component (if applicable):
=============================================================

[root at ninja core]# gluster --version
glusterfs 3.7.1 built on Jun 28 2015 11:01:17
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.
[root at ninja core]# 

How reproducible:
================
seen once

Actual results:


Expected results:


Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-07-01
02:43:54 EDT ---

This bug is automatically being proposed for Red Hat Gluster Storage 3.1.0 by
setting the release flag 'rhgs‑3.1.0' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Bhaskarakiran on 2015-07-01 02:45:58 EDT ---



--- Additional comment from Bhaskarakiran on 2015-07-01 05:17:37 EDT ---

copied the sosreports to rhsqe-repo/sosreports/1238067 folder.

--- Additional comment from Bhaskarakiran on 2015-07-01 05:22:35 EDT ---

time of crash :

-rw-------. 1 root root 232M Jun 30 16:14 core.2913.1435661084.dump

--- Additional comment from Bhaskarakiran on 2015-07-01 05:27:20 EDT ---

sosrepot :

rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1238067/sosreport-sysreg-prod-20150701140725.tar.xz

--- Additional comment from Atin Mukherjee on 2015-07-01 23:48:18 EDT ---

The crash happened while glusterD service was going down. This doesn't impact
the functionality and the crash is because of race between clean up thread and
running thread. The clean up thread releases URCU resources while one of the
running thread still try to access it resulting into a crash. Hence this can be
deferred to 3.1.z.

--- Additional comment from Rejy M Cyriac on 2015-07-03 01:54:02 EDT ---

Since this BZ is not a Blocker for the RHGS 3.1 release, and the phase for
fixing non-blocker bugs is over for the release, re-proposing this BZ for the
RHGS 3.1 Z-stream release


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1223636
[Bug 1223636] 3.1 QE Tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1238067
[Bug 1238067] Glusterd crashed
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list