[Bugs] [Bug 1436543] New: Brick Multiplexing: Glusterd crashed when stopping volumes

bugzilla at redhat.com bugzilla at redhat.com
Tue Mar 28 07:13:24 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1436543

            Bug ID: 1436543
           Summary: Brick Multiplexing: Glusterd crashed when stopping
                    volumes
           Product: GlusterFS
           Version: 3.10
         Component: glusterd
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: nchilaka at redhat.com
                CC: bugs at gluster.org



Description of problem:
=====================
had about 42 volumes as below on a  3 node setup
10 vols of 2x2 type
10 vols of 2x(4+2) type
10 1x3 volumes
10 1x2 volumes
1 1x2 and 1 1x3 volume===>created before brick multiplex enabled

I started to stop volumes all volumes one after another

>From another Node, I was deleting volumes which were stopped

I found that after about 20 volumes, the glusterd crashed on the node where I
was stopping the volumes


[root at dhcp35-192 ~]# file /core.9140 
/core.9140: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from
'/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO', real uid: 0,
effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterd',
platform: 'x86_64'
[root at dhcp35-192 ~]# gdb /usr/sbin/glusterd /core.9140
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from
/usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 9148]
[New LWP 9143]
[New LWP 9147]
[New LWP 9142]
[New LWP 9141]
[New LWP 9144]
[New LWP 9140]
[New LWP 9145]
 [Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
 Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level
INFO'.
Program terminated with signal 11, Segmentation fault.
#0  list_del_init (old=0x7fb6d4962cf0) at list.h:87
87        old->prev->next = old->next;
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64
device-mapper-event-libs-1.02.135-1.el7_3.3.x86_64
device-mapper-libs-1.02.135-1.el7_3.3.x86_64 elfutils-libelf-0.166-2.el7.x86_64
elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64
libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-33.el7.x86_64
libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64
libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64
libsepol-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64
libxml2-2.9.1-6.el7_2.3.x86_64 lvm2-libs-2.02.166-1.el7_3.3.x86_64
openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64
systemd-libs-219-30.el7_3.7.x86_64 userspace-rcu-0.7.16-3.el7.x86_64
xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  list_del_init (old=0x7fb6d4962cf0) at list.h:87
#1  __run (task=task at entry=0x7fb6d4962cf0) at syncop.c:255
#2  0x00007fb70a0538d1 in synctask_wake (task=0x7fb6d4962cf0) at syncop.c:359
#3  0x00007fb6febede66 in _gd_syncop_brick_op_cbk
(req=req at entry=0x7fb6e4a87990, 
    iov=iov at entry=0x7fb6fa595860, count=count at entry=1,
myframe=myframe at entry=0x7fb6e7c232d0)
    at glusterd-syncop.c:937
#4  0x00007fb6feb8862a in glusterd_big_locked_cbk (req=0x7fb6e4a87990,
iov=0x7fb6fa595860, count=1, 
    myframe=0x7fb6e7c232d0, fn=0x7fb6febedbc0 <_gd_syncop_brick_op_cbk>) at
glusterd-rpc-ops.c:222
#5  0x00007fb709de48d5 in saved_frames_unwind
(saved_frames=saved_frames at entry=0x7fb6f001bfb0)
    at rpc-clnt.c:369
#6  0x00007fb709de49be in saved_frames_destroy (frames=0x7fb6f001bfb0) at
rpc-clnt.c:386
#7  0x00007fb709de6124 in rpc_clnt_connection_cleanup
(conn=conn at entry=0x7fb6f4201ff8)
    at rpc-clnt.c:555
#8  0x00007fb709de69ac in rpc_clnt_handle_disconnect (conn=0x7fb6f4201ff8,
clnt=0x7fb6f4201fa0)
    at rpc-clnt.c:880
#9  rpc_clnt_notify (trans=<optimized out>, mydata=0x7fb6f4201ff8,
event=RPC_TRANSPORT_DISCONNECT, 
    data=0x7fb6f42925f0) at rpc-clnt.c:936
#10 0x00007fb709de29e3 in rpc_transport_notify (this=this at entry=0x7fb6f42925f0, 
    event=event at entry=RPC_TRANSPORT_DISCONNECT, data=data at entry=0x7fb6f42925f0)
at rpc-transport.c:538
#11 0x00007fb6fbfa77b2 in socket_event_poll_err (this=0x7fb6f42925f0) at
socket.c:1180
#12 socket_event_handler (fd=<optimized out>, idx=20, data=0x7fb6f42925f0,
poll_in=0, poll_out=4, 
    poll_err=<optimized out>) at socket.c:2405
#13 0x00007fb70a076fa0 in event_dispatch_epoll_handler (event=0x7fb6fa595e80, 
    event_pool=0x7fb70b1e5fe0) at event-epoll.c:572
#14 event_dispatch_epoll_worker (data=0x7fb70b207c10) at event-epoll.c:675
#15 0x00007fb708e7ddc5 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#16 0x00007fb7087c273d in clone () from /lib64/libc.so.6
(gdb) quit
[root at dhcp35-192 ~]# 
[root at dhcp35-192 ~]# 
[root at dhcp35-192 ~]# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor
preset: disabled)
   Active: failed (Result: signal) since Tue 2017-03-28 12:12:25 IST; 23min ago
  Process: 9139 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 9140 (code=killed, signal=SEGV)
   CGroup: /system.slice/glusterd.service

Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com glusterd[9140]: setfsid 1
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com glusterd[9140]: spinlock 1
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com glusterd[9140]: epoll.h 1
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com glusterd[9140]: xattr.h 1
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com glusterd[9140]:
st_atim.tv_nsec 1
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com glusterd[9140]:
package-string: glusterfs 3.10.0
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com glusterd[9140]: ---------
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com systemd[1]: glusterd.service:
main process exited...GV
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com systemd[1]: Unit
glusterd.service entered failed ...e.
Mar 28 12:12:25 dhcp35-192.lab.eng.blr.redhat.com systemd[1]: glusterd.service
failed.
Hint: Some lines were ellipsized, use -l to show in full.





Version-Release number of selected component (if applicable):
=========
[root at dhcp35-192 ~]# rpm -qa|grep gluster
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-debuginfo-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64
glusterfs-rdma-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
[root at dhcp35-192 ~]#

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list