[Gluster-users] glustershd coredump generated while reboot all 3 sn nodes

Zhou, Cynthia (NSB - CN/Hangzhou) cynthia.zhou at nokia-sbell.com
Thu Oct 11 07:35:37 UTC 2018


Hi,
I find that when restart sn node sometimes, the glustershd will exit and generate coredump. It has happened twice in my env, I would like to know your opinion on this issue, thanks!

The glusterfs version I use is glusterfs3.12.3

[root at sn-1:/root]
# gluster v info log

Volume Name: log
Type: Replicate
Volume ID: 87bcbaf8-5fa4-4060-9149-23f832befe92
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sn-0.local:/mnt/bricks/log/brick
Brick2: sn-1.local:/mnt/bricks/log/brick
Brick3: sn-2.local:/mnt/bricks/log/brick
Options Reconfigured:
server.allow-insecure: on
cluster.quorum-type: auto
network.ping-timeout: 42
cluster.consistent-metadata: on
cluster.favorite-child-policy: mtime
cluster.quorum-reads: no
cluster.server-quorum-type: none
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.server-quorum-ratio: 51%
[root at sn-1:/root]
///////////////////////////////////////////////glustershd coredump////////////////////////////////////////////////////////////////
# lz4 -d core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000.lz4
Decoding file core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
core.glusterfs.0.c5f : decoded 263188480 bytes
[root at sn-0:/mnt/export]
# gdb /usr/sbin/glusterfs core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
GNU gdb (GDB) Fedora 8.1-14.wf29
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 1818]
[New LWP 1812]
[New LWP 1813]
[New LWP 1817]
[New LWP 1966]
[New LWP 1968]
[New LWP 1970]
[New LWP 1974]
[New LWP 1976]
[New LWP 1814]
[New LWP 1815]
[New LWP 1816]
[New LWP 1828]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id gluster/glustershd -p /var/run/g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
2802      client-rpc-fops.c: No such file or directory.
[Current thread is 1 (Thread 0x7f1b5f00c700 (LWP 1818))]
Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.2.0_1_g54e6196-RCP2.wf29.x86_64
(gdb) bt
#0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
#1  0x00007f1b64553d47 in rpc_clnt_handle_reply (clnt=0x7f1b5808bbb0, pollin=0x7f1b580c6620) at rpc-clnt.c:778
#2  0x00007f1b645542e5 in rpc_clnt_notify (trans=0x7f1b5808bde0, mydata=0x7f1b5808bbe0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at rpc-clnt.c:971
#3  0x00007f1b64550319 in rpc_transport_notify (this=0x7f1b5808bde0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at rpc-transport.c:538
#4  0x00007f1b5f49734d in socket_event_poll_in (this=0x7f1b5808bde0, notify_handled=_gf_true) at socket.c:2315
#5  0x00007f1b5f497992 in socket_event_handler (fd=25, idx=15, gen=7, data=0x7f1b5808bde0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x00007f1b647fe5ac in event_dispatch_epoll_handler (event_pool=0x230cb00, event=0x7f1b5f00be84) at event-epoll.c:583
#7  0x00007f1b647fe883 in event_dispatch_epoll_worker (data=0x23543d0) at event-epoll.c:659
#8  0x00007f1b6354a5da in start_thread () from /lib64/libpthread.so.0
#9  0x00007f1b62e20cbf in clone () from /lib64/libc.so.6
(gdb) print *(call_frame_t*)myframe
$1 = {root = 0x100000000, parent = 0x100000005, frames = {next = 0x7f1b4401c8a8, prev = 0x7f1b44010190}, local = 0x0, this = 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {
        __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x7f1b44010190, __next = 0x0}},
      __size = '\000' <repeats 24 times>, "\220\001\001D\033\177\000\000\000\000\000\000\000\000\000", __align = 0}}, cookie = 0x7f1b4401ccf0, complete = _gf_false, op = GF_FOP_NULL, begin = {
    tv_sec = 139755081730912, tv_usec = 139755081785872}, end = {tv_sec = 448811404, tv_usec = 21474836481}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0}
(gdb) info thread
  Id   Target Id         Frame
* 1    Thread 0x7f1b5f00c700 (LWP 1818) 0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
  2    Thread 0x7f1b64c83780 (LWP 1812) 0x00007f1b6354ba3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
  3    Thread 0x7f1b61eae700 (LWP 1813) 0x00007f1b63554300 in nanosleep () from /lib64/libpthread.so.0
  4    Thread 0x7f1b5feaa700 (LWP 1817) 0x00007f1b635508ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7f1b5ca2b700 (LWP 1966) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  6    Thread 0x7f1b4f7fe700 (LWP 1968) 0x00007f1b6355050c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7f1b4e7fc700 (LWP 1970) 0x00007f1b6355050c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7f1b4d7fa700 (LWP 1974) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  9    Thread 0x7f1b33fff700 (LWP 1976) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  10   Thread 0x7f1b616ad700 (LWP 1814) 0x00007f1b62d5fcbc in sigtimedwait () from /lib64/libc.so.6
  11   Thread 0x7f1b60eac700 (LWP 1815) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  12   Thread 0x7f1b606ab700 (LWP 1816) 0x00007f1b635508ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13   Thread 0x7f1b5d6d5700 (LWP 1828) 0x00007f1b62e20fe7 in epoll_wait () from /lib64/libc.so.6
(gdb) quit

The source code is like this, so from gdb it coredump because frame->local is NULL!!

[cid:image001.jpg at 01D46178.06B71AA0]

>From sn-0 journal log,

Sep 26 16:04:40.034577 sn-0 systemd-coredump[2612]: Process 1812 (glusterfs) of user 0 dumped core.

                                                    Stack trace of thread 1818:
                                                    #0  0x00007f1b5e5d7d24 client3_3_lookup_cbk (client.so)
                                                    #1  0x00007f1b64553d47 rpc_clnt_handle_reply (libgfrpc.so.0)
                                                    #2  0x00007f1b645542e5 rpc_clnt_notify (libgfrpc.so.0)
                                                    #3  0x00007f1b64550319 rpc_transport_notify (libgfrpc.so.0)
                                                    #4  0x00007f1b5f49734d socket_event_poll_in (socket.so)
                                                    #5  0x00007f1b5f497992 socket_event_handler (socket.so)
                                                    #6  0x00007f1b647fe5ac event_dispatch_epoll_handler (libglusterfs.so.0)
                                                    #7  0x00007f1b647fe883 event_dispatch_epoll_worker (libglusterfs.so.0)
                                                    #8  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #9  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1812:
                                                    #0  0x00007f1b6354ba3d __GI___pthread_timedjoin_ex (libpthread.so.0)
                                                    #1  0x00007f1b647feae1 event_dispatch_epoll (libglusterfs.so.0)
                                                    #2  0x00007f1b647c2703 event_dispatch (libglusterfs.so.0)
                                                    #3  0x000000000040ab95 main (glusterfsd)
                                                    #4  0x00007f1b62d4baf7 __libc_start_main (libc.so.6)
                                                    #5  0x000000000040543a _start (glusterfsd)

                                                    Stack trace of thread 1813:
                                                    #0  0x00007f1b63554300 __nanosleep (libpthread.so.0)
                                                    #1  0x00007f1b647a04e5 gf_timer_proc (libglusterfs.so.0)
                                                    #2  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #3  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1817:
                                                    #0  0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                                                    #1  0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
                                                    #2  0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1966:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1968:
                                                    #0  0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                    #1  0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
                                                    #2  0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
                                                    #3  0x00007f1b5e357cde afr_selfheal_unlocked_discover_on (replicate.so)
                                                    #4  0x00007f1b5e357d80 afr_selfheal_unlocked_discover (replicate.so)
                                                    #5  0x00007f1b5e363bf8 __afr_selfheal_entry_prepare (replicate.so)
                                                    #6  0x00007f1b5e3641c0 afr_selfheal_entry_dirent (replicate.so)
                                                    #7  0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
                                                    #8  0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
                                                    #9  0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
                                                    #10 0x00007f1b5e365bba afr_selfheal_entry (replicate.so)
                                                    #11 0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
                                                    #12 0x00007f1b5e35d346 afr_selfheal (replicate.so)
                                                    #13 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
                                                    #14 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
                                                    #15 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0)
                                                    #16 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
                                                    #17 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
                                                    #18 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
                                                    #19 0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #20 0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1970:
                                                    #0  0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                                    #1  0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
                                                    #2  0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
                                                    #3  0x00007f1b5e357742 afr_selfheal_unlocked_lookup_on (replicate.so)
                                                    #4  0x00007f1b5e364204 afr_selfheal_entry_dirent (replicate.so)
                                                    #5  0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
                                                    #6  0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
                                                    #7  0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
                                                    #8  0x00007f1b5e365bba afr_selfheal_entry (replicate.so)
                                                    #9  0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
                                                    #10 0x00007f1b5e35d346 afr_selfheal (replicate.so)
                                                    #11 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
                                                    #12 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
                                                    #13 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0)
                                                    #14 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
                                                    #15 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
                                                    #16 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
                                                    #17 0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #18 0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1974:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1976:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1814:
                                                    #0  0x00007f1b62d5fcbc __sigtimedwait (libc.so.6)
                                                    #1  0x00007f1b63554afc sigwait (libpthread.so.0)
                                                    #2  0x0000000000409ed7 glusterfs_sigwaiter (glusterfsd)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1815:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b647c3f5c pool_sweeper (libglusterfs.so.0)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1816:
                                                   #0  0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                                                    #1  0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
                                                    #2  0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1828:
                                                    #0  0x00007f1b62e20fe7 epoll_wait (libc.so.6)
                                                    #1  0x00007f1b647fe855 event_dispatch_epoll_worker (libglusterfs.so.0)
                                                    #2  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #3  0x00007f1b62e20cbf __clone (libc.so.6)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181011/a9cda9cb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 59207 bytes
Desc: image001.jpg
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181011/a9cda9cb/attachment.jpg>


More information about the Gluster-users mailing list