[Gluster-users] glustershd coredump generated while reboot all 3 sn nodes

Tue Oct 23 08:05:32 UTC 2018

I did some further study of this issue, I think if frame 0x7f84740116e0 get freed but still kept in rpc_clnt saved_frame_list, this issue is possible to happen. because when frame destroy it is put to hot list ,and most likely to be reused next time, but by the next time it got used, its ret address will be changed, and when previous request’s response returned, it still could retrieve this changed frame, which is wrong!
I find that when FRAME_DESTROY, it does not do anything to rpc_clnt saved_frame_list(actually when free frame, it should not be in saved_frame_list), can we add check like checking every element in saved_frame_list to make sure no frame(to be destroyed) is in the saved_frame_list ?
Looking forward for your reply!

From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Friday, October 19, 2018 9:59 AM
To: 'Ravishankar N' <ravishankar at redhat.com>
Cc: 'gluster-users' <gluster-users at gluster.org>
Subject: RE: glustershd coredump generated while reboot all 3 sn nodes

Hi,
From one coredump recently I got two interesting thread call back trace, from which seems glustershd has two thread polling in message from the same client simultaneously,

Thread 17 (Thread 0x7f8485247700 (LWP 6063)):
#0  0x00007f8489787c80 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007f848a9c177e in dict_ref (this=0x18004f0f0) at dict.c:660
#2  0x00007f84845920e4 in afr_selfheal_discover_cbk (frame=0x7f847400bf00, cookie=0x2, this=0x7f84800390b0, op_ret=0, op_errno=0, inode=0x0,
    buf=0x7f84740116e0, xdata=0x18004f0f0, parbuf=0x7f8474019e80) at afr-self-heal-common.c:1723
#3  0x00007f848480b96d in client3_3_entrylk_cbk (req=0x7f8474019e40, iov=0x7f8474019e80, count=1, myframe=0x7f84740116e0) at client-rpc-fops.c:1611
#4  0x00007f848a78ed47 in rpc_clnt_handle_reply (clnt=0x7f848004f0c0, pollin=0x7f84800bd6d0) at rpc-clnt.c:778
#5  0x00007f848a78f2e5 in rpc_clnt_notify (trans=0x7f848004f2f0, mydata=0x7f848004f0f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f84800bd6d0)
    at rpc-clnt.c:971
#6  0x00007f848a78b319 in rpc_transport_notify (this=0x7f848004f2f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f84800bd6d0) at rpc-transport.c:538
#7  0x00007f84856d234d in socket_event_poll_in (this=0x7f848004f2f0, notify_handled=_gf_true) at socket.c:2315
#8  0x00007f84856d2992 in socket_event_handler (fd=15, idx=8, gen=1, data=0x7f848004f2f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#9  0x00007f848aa395ac in event_dispatch_epoll_handler (event_pool=0x1d40b00, event=0x7f8485246e84) at event-epoll.c:583
#10 0x00007f848aa39883 in event_dispatch_epoll_worker (data=0x1d883d0) at event-epoll.c:659
#11 0x00007f84897855da in start_thread () from /lib64/libpthread.so.0
#12 0x00007f848905bcbf in clone () from /lib64/libc.so.6

And
Thread 1 (Thread 0x7f847f6ff700 (LWP 6083)):
#0  0x00007f8484812d24 in client3_3_lookup_cbk (req=0x7f8474002300, iov=0x7f8474002340, count=1, myframe=0x7f84740116e0) at client-rpc-fops.c:2802
#1  0x00007f848a78ed47 in rpc_clnt_handle_reply (clnt=0x7f848004f0c0, pollin=0x7f847800d9f0) at rpc-clnt.c:778
#2  0x00007f848a78f2e5 in rpc_clnt_notify (trans=0x7f848004f2f0, mydata=0x7f848004f0f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f847800d9f0)
    at rpc-clnt.c:971
#3  0x00007f848a78b319 in rpc_transport_notify (this=0x7f848004f2f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f847800d9f0) at rpc-transport.c:538
#4  0x00007f84856d234d in socket_event_poll_in (this=0x7f848004f2f0, notify_handled=_gf_true) at socket.c:2315
#5  0x00007f84856d2992 in socket_event_handler (fd=15, idx=8, gen=1, data=0x7f848004f2f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x00007f848aa395ac in event_dispatch_epoll_handler (event_pool=0x1d40b00, event=0x7f847f6fee84) at event-epoll.c:583
#7  0x00007f848aa39883 in event_dispatch_epoll_worker (data=0x7f848004ef30) at event-epoll.c:659
#8  0x00007f84897855da in start_thread () from /lib64/libpthread.so.0
#9  0x00007f848905bcbf in clone () from /lib64/libc.so.6

Coredump generate because theread 1 myframe->local is 0
(gdb) print *(struct _call_frame*)myframe
$12 = {root = 0x7f8474008180, parent = 0x7f847400bf00, frames = {next = 0x7f8474009230, prev = 0x7f8474008878}, local = 0x0,
  this = 0x7f8480036e20, ret = 0x7f8484591e93 <afr_selfheal_discover_cbk>, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {
        __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
          __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, cookie = 0x2, complete = _gf_true, op = GF_FOP_NULL,
  begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0},
  wind_from = 0x7f84845cba60 <__FUNCTION__.18726> "afr_selfheal_unlocked_discover_on",
  wind_to = 0x7f84845cb090 "__priv->children[__i]->fops->lookup",
  unwind_from = 0x7f848483a350 <__FUNCTION__.18496> "client3_3_entrylk_cbk", unwind_to = 0x7f84845cb0b4 "afr_selfheal_discover_cbk"}

[Analysis]
It seems thread 17 is receiving a msg reply and it get call frame 0x7f84740116e0 and called client3_3_entrylk_cbk, but from source code, when client3_3_entrylk_cbk do unwind
[cid:image002.jpg at 01D46AE9.65090080]
Even if it could find correct ret address, the param passed to it should not be as the params high-lighted in green colour!!
Another weird thing is that when rpc_clnt_handle_reply find frame 0x7f84740116e0, it should be removed from list, why thread 1 could retrieve this from again??
I checked that when client3_3_lookup_cbk do the unwind the param passed to parent frame is as the param high-lighted in green colour.

From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Tuesday, October 16, 2018 5:24 PM
To: Ravishankar N <ravishankar at redhat.com<mailto:ravishankar at redhat.com>>
Cc: gluster-users <gluster-users at gluster.org<mailto:gluster-users at gluster.org>>
Subject: RE: glustershd coredump generated while reboot all 3 sn nodes

By the way not any private patch applied, https://bugzilla.redhat.com/show_bug.cgi?id=1639632 is created to follow this issue, I enclose one coredump in the bug,
There is not much useful info from glustershd log, because this process coredump suddenly the log only show prints several seconds before.

cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Tuesday, October 16, 2018 2:15 PM
To: Ravishankar N <ravishankar at redhat.com<mailto:ravishankar at redhat.com>>
Cc: gluster-users <gluster-users at gluster.org<mailto:gluster-users at gluster.org>>
Subject: RE: glustershd coredump generated while reboot all 3 sn nodes

Hi,
Yes it is glusterfs3.12.3
I will create BZ and attach related coredump and glusterfs log

cynthia

From: Ravishankar N <ravishankar at redhat.com<mailto:ravishankar at redhat.com>>
Sent: Tuesday, October 16, 2018 12:23 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>>
Cc: gluster-users <gluster-users at gluster.org<mailto:gluster-users at gluster.org>>
Subject: Re: glustershd coredump generated while reboot all 3 sn nodes

Hi,

- Is this stock glusterfs-3.12.3? Or do you have any patches applied on top of it?

- If it is stock, could you create a BZ and attach the core file and the /var/log/glusterfs/ logs from 3 nodes at the time of crash?
Thanks,
Ravi
On 10/16/2018 08:45 AM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:
Hi,
This issue happened twice recently, when glustershd do heal, it generate coredump occassinally,
I do some debug and find that sometimes afr_selfheal_unlocked_discover_on do lookup and saved the frame in function rpc_clnt_submit, when reply comes, it find the saved frame , but the address is different from the saved frame address, I think this is wrong, but I can not find a clue how this happened?

[root at mn-0:/home/robot]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id gluster/glustershd -p /var/run/g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802
2802      client-rpc-fops.c: No such file or directory.
[Current thread is 1 (Thread 0x7fb1a7a0e700 (LWP 8151))]
Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.2.0-RCP2.wf29.x86_64
(gdb) bt
#0  0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802
#1  0x00007fb1acf55d47 in rpc_clnt_handle_reply (clnt=0x7fb1a008fff0, pollin=0x7fb1a0843910) at rpc-clnt.c:778
#2  0x00007fb1acf562e5 in rpc_clnt_notify (trans=0x7fb1a00901c0, mydata=0x7fb1a0090020, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fb1a0843910) at rpc-clnt.c:971
#3  0x00007fb1acf52319 in rpc_transport_notify (this=0x7fb1a00901c0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fb1a0843910) at rpc-transport.c:538
#4  0x00007fb1a7e9934d in socket_event_poll_in (this=0x7fb1a00901c0, notify_handled=_gf_true) at socket.c:2315
#5  0x00007fb1a7e99992 in socket_event_handler (fd=20, idx=14, gen=103, data=0x7fb1a00901c0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x00007fb1ad2005ac in event_dispatch_epoll_handler (event_pool=0x175fb00, event=0x7fb1a7a0de84) at event-epoll.c:583
#7  0x00007fb1ad200883 in event_dispatch_epoll_worker (data=0x17a73d0) at event-epoll.c:659
#8  0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0
#9  0x00007fb1ab822cbf in clone () from /lib64/libc.so.6
(gdb) info thread
  Id   Target Id         Frame
* 1    Thread 0x7fb1a7a0e700 (LWP 8151) 0x00007fb1a6fd9d24 in client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802
  2    Thread 0x7fb1aa0af700 (LWP 8147) 0x00007fb1ab761cbc in sigtimedwait () from /lib64/libc.so.6
  3    Thread 0x7fb1a98ae700 (LWP 8148) 0x00007fb1ab7f04b0 in nanosleep () from /lib64/libc.so.6
  4    Thread 0x7fb1957fa700 (LWP 8266) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  5    Thread 0x7fb1a88ac700 (LWP 8150) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  6    Thread 0x7fb17f7fe700 (LWP 8269) 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  7    Thread 0x7fb1aa8b0700 (LWP 8146) 0x00007fb1abf56300 in nanosleep () from /lib64/libpthread.so.0
  8    Thread 0x7fb1ad685780 (LWP 8145) 0x00007fb1abf4da3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
  9    Thread 0x7fb1a542d700 (LWP 8251) 0x00007fb1ab7f04b0 in nanosleep () from /lib64/libc.so.6
  10   Thread 0x7fb1a4c2c700 (LWP 8260) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  11   Thread 0x7fb196ffd700 (LWP 8263) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  12   Thread 0x7fb1a60d7700 (LWP 8247) 0x00007fb1ab822fe7 in epoll_wait () from /lib64/libc.so.6
  13   Thread 0x7fb1a90ad700 (LWP 8149) 0x00007fb1abf528ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
(gdb) print (call_frame_t*)myframe
$1 = (call_frame_t *) 0x7fb188215740
(gdb) print *(call_frame_t*)myframe
$2 = {root = 0x7fb1a0085090, parent = 0xcd4642c4a3efd678, frames = {next = 0x151e2a92a5ae1bb, prev = 0x0}, local = 0x0, this = 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {
        __lock = 0, __count = 0, __owner = 0, __nusers = 4, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x7fb188215798, __next = 0x7fb188215798}},
      __size = '\000' <repeats 12 times>, "\004", '\000' <repeats 11 times>, "\230W!\210\261\177\000\000\230W!\210\261\177\000", __align = 0}}, cookie = 0x7fb1882157a8, complete = (unknown: 2283886504),
  op = 32689, begin = {tv_sec = 140400469825464, tv_usec = 140400469825464}, end = {tv_sec = 140400878737576, tv_usec = 140400132101048}, wind_from = 0x7fb18801cdc0 "", wind_to = 0x0, unwind_from = 0x0,
  unwind_to = 0x0}
(gdb) thread 6
[Switching to thread 6 (Thread 0x7fb17f7fe700 (LWP 8269))]
#0  0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
#1  0x00007fb1ad1dc993 in __syncbarrier_wait (barrier=0x7fb188014790, waitfor=3) at syncop.c:1138
#2  0x00007fb1ad1dc9e4 in syncbarrier_wait (barrier=0x7fb188014790, waitfor=3) at syncop.c:1155
#3  0x00007fb1a6d59cde in afr_selfheal_unlocked_discover_on (frame=0x7fb1882162d0, inode=0x7fb188215740, gfid=0x7fb17f7fdb00 "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
    replies=0x7fb17f7fcf40, discover_on=0x7fb1a0084cb0 "\001\001\001", <incomplete sequence \360\255\272>) at afr-self-heal-common.c:1809
#4  0x00007fb1a6d59d80 in afr_selfheal_unlocked_discover (frame=0x7fb1882162d0, inode=0x7fb188215740, gfid=0x7fb17f7fdb00 "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
    replies=0x7fb17f7fcf40) at afr-self-heal-common.c:1828
#5  0x00007fb1a6d5e51f in afr_selfheal_unlocked_inspect (frame=0x7fb1882162d0, this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
    link_inode=0x7fb17f7fd9c8, data_selfheal=0x7fb17f7fd9c4, metadata_selfheal=0x7fb17f7fd9c0, entry_selfheal=0x7fb17f7fd9bc) at afr-self-heal-common.c:2241
#6  0x00007fb1a6d5f19b in afr_selfheal_do (frame=0x7fb1882162d0, this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177") at afr-self-heal-common.c:2483
#7  0x00007fb1a6d5f346 in afr_selfheal (this=0x7fb1a001db40, gfid=0x7fb17f7fdb00 "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177") at afr-self-heal-common.c:2543
#8  0x00007fb1a6d6ac5c in afr_shd_selfheal (healer=0x7fb1a0085640, child=0, gfid=0x7fb17f7fdb00 "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177") at afr-self-heald.c:343
#9  0x00007fb1a6d6b00b in afr_shd_index_heal (subvol=0x7fb1a00171e0, entry=0x7fb1a0714180, parent=0x7fb17f7fddc0, data=0x7fb1a0085640) at afr-self-heald.c:440
#10 0x00007fb1ad201ed3 in syncop_mt_dir_scan (frame=0x7fb1a07a0e90, subvol=0x7fb1a00171e0, loc=0x7fb17f7fddc0, pid=-6, data=0x7fb1a0085640, fn=0x7fb1a6d6aebc <afr_shd_index_heal>, xdata=0x7fb1a07b4ed0,
    max_jobs=1, max_qlen=1024) at syncop-utils.c:407
#11 0x00007fb1a6d6b2b5 in afr_shd_index_sweep (healer=0x7fb1a0085640, vgfid=0x7fb1a6d93610 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:494
#12 0x00007fb1a6d6b394 in afr_shd_index_sweep_all (healer=0x7fb1a0085640) at afr-self-heald.c:517
#13 0x00007fb1a6d6b697 in afr_shd_index_healer (data=0x7fb1a0085640) at afr-self-heald.c:597
#14 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0
#15 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6

From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Thursday, October 11, 2018 3:36 PM
To: Ravishankar N <ravishankar at redhat.com><mailto:ravishankar at redhat.com>
Cc: gluster-users <gluster-users at gluster.org><mailto:gluster-users at gluster.org>
Subject: glustershd coredump generated while reboot all 3 sn nodes

Hi,
I find that when restart sn node sometimes, the glustershd will exit and generate coredump. It has happened twice in my env, I would like to know your opinion on this issue, thanks!

The glusterfs version I use is glusterfs3.12.3

[root at sn-1:/root]
# gluster v info log

Volume Name: log
Type: Replicate
Volume ID: 87bcbaf8-5fa4-4060-9149-23f832befe92
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sn-0.local:/mnt/bricks/log/brick
Brick2: sn-1.local:/mnt/bricks/log/brick
Brick3: sn-2.local:/mnt/bricks/log/brick
Options Reconfigured:
server.allow-insecure: on
cluster.quorum-type: auto
network.ping-timeout: 42
cluster.consistent-metadata: on
cluster.favorite-child-policy: mtime
cluster.quorum-reads: no
cluster.server-quorum-type: none
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.server-quorum-ratio: 51%
[root at sn-1:/root]
///////////////////////////////////////////////glustershd coredump////////////////////////////////////////////////////////////////
# lz4 -d core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000.lz4
Decoding file core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
core.glusterfs.0.c5f : decoded 263188480 bytes
[root at sn-0:/mnt/export]
# gdb /usr/sbin/glusterfs core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
GNU gdb (GDB) Fedora 8.1-14.wf29
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 1818]
[New LWP 1812]
[New LWP 1813]
[New LWP 1817]
[New LWP 1966]
[New LWP 1968]
[New LWP 1970]
[New LWP 1974]
[New LWP 1976]
[New LWP 1814]
[New LWP 1815]
[New LWP 1816]
[New LWP 1828]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s sn-0.local --volfile-id gluster/glustershd -p /var/run/g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
2802      client-rpc-fops.c: No such file or directory.
[Current thread is 1 (Thread 0x7f1b5f00c700 (LWP 1818))]
Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.2.0_1_g54e6196-RCP2.wf29.x86_64
(gdb) bt
#0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
#1  0x00007f1b64553d47 in rpc_clnt_handle_reply (clnt=0x7f1b5808bbb0, pollin=0x7f1b580c6620) at rpc-clnt.c:778
#2  0x00007f1b645542e5 in rpc_clnt_notify (trans=0x7f1b5808bde0, mydata=0x7f1b5808bbe0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at rpc-clnt.c:971
#3  0x00007f1b64550319 in rpc_transport_notify (this=0x7f1b5808bde0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f1b580c6620) at rpc-transport.c:538
#4  0x00007f1b5f49734d in socket_event_poll_in (this=0x7f1b5808bde0, notify_handled=_gf_true) at socket.c:2315
#5  0x00007f1b5f497992 in socket_event_handler (fd=25, idx=15, gen=7, data=0x7f1b5808bde0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x00007f1b647fe5ac in event_dispatch_epoll_handler (event_pool=0x230cb00, event=0x7f1b5f00be84) at event-epoll.c:583
#7  0x00007f1b647fe883 in event_dispatch_epoll_worker (data=0x23543d0) at event-epoll.c:659
#8  0x00007f1b6354a5da in start_thread () from /lib64/libpthread.so.0
#9  0x00007f1b62e20cbf in clone () from /lib64/libc.so.6
(gdb) print *(call_frame_t*)myframe
$1 = {root = 0x100000000, parent = 0x100000005, frames = {next = 0x7f1b4401c8a8, prev = 0x7f1b44010190}, local = 0x0, this = 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {
        __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x7f1b44010190, __next = 0x0}},
      __size = '\000' <repeats 24 times>, "\220\001\001D\033\177\000\000\000\000\000\000\000\000\000", __align = 0}}, cookie = 0x7f1b4401ccf0, complete = _gf_false, op = GF_FOP_NULL, begin = {
    tv_sec = 139755081730912, tv_usec = 139755081785872}, end = {tv_sec = 448811404, tv_usec = 21474836481}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0}
(gdb) info thread
  Id   Target Id         Frame
* 1    Thread 0x7f1b5f00c700 (LWP 1818) 0x00007f1b5e5d7d24 in client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340, count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
  2    Thread 0x7f1b64c83780 (LWP 1812) 0x00007f1b6354ba3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
  3    Thread 0x7f1b61eae700 (LWP 1813) 0x00007f1b63554300 in nanosleep () from /lib64/libpthread.so.0
  4    Thread 0x7f1b5feaa700 (LWP 1817) 0x00007f1b635508ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  5    Thread 0x7f1b5ca2b700 (LWP 1966) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  6    Thread 0x7f1b4f7fe700 (LWP 1968) 0x00007f1b6355050c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  7    Thread 0x7f1b4e7fc700 (LWP 1970) 0x00007f1b6355050c in pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  8    Thread 0x7f1b4d7fa700 (LWP 1974) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  9    Thread 0x7f1b33fff700 (LWP 1976) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  10   Thread 0x7f1b616ad700 (LWP 1814) 0x00007f1b62d5fcbc in sigtimedwait () from /lib64/libc.so.6
  11   Thread 0x7f1b60eac700 (LWP 1815) 0x00007f1b62dee4b0 in nanosleep () from /lib64/libc.so.6
  12   Thread 0x7f1b606ab700 (LWP 1816) 0x00007f1b635508ca in pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
  13   Thread 0x7f1b5d6d5700 (LWP 1828) 0x00007f1b62e20fe7 in epoll_wait () from /lib64/libc.so.6
(gdb) quit

The source code is like this, so from gdb it coredump because frame->local is NULL!!

[cid:image004.jpg at 01D46AE9.65090080]

From sn-0 journal log,

Sep 26 16:04:40.034577 sn-0 systemd-coredump[2612]: Process 1812 (glusterfs) of user 0 dumped core.

                                                    Stack trace of thread 1818:
                                                    #0  0x00007f1b5e5d7d24 client3_3_lookup_cbk (client.so)
                                                    #1  0x00007f1b64553d47 rpc_clnt_handle_reply (libgfrpc.so.0)
                                                    #2  0x00007f1b645542e5 rpc_clnt_notify (libgfrpc.so.0)
                                                    #3  0x00007f1b64550319 rpc_transport_notify (libgfrpc.so.0)
                                                    #4  0x00007f1b5f49734d socket_event_poll_in (socket.so)
                                                    #5  0x00007f1b5f497992 socket_event_handler (socket.so)
                                                    #6  0x00007f1b647fe5ac event_dispatch_epoll_handler (libglusterfs.so.0)
                                                    #7  0x00007f1b647fe883 event_dispatch_epoll_worker (libglusterfs.so.0)
                                                    #8  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #9  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1812:
                                                    #0  0x00007f1b6354ba3d __GI___pthread_timedjoin_ex (libpthread.so.0)
                                                    #1  0x00007f1b647feae1 event_dispatch_epoll (libglusterfs.so.0)
                                                    #2  0x00007f1b647c2703 event_dispatch (libglusterfs.so.0)
                                                    #3  0x000000000040ab95 main (glusterfsd)
                                                    #4  0x00007f1b62d4baf7 __libc_start_main (libc.so.6)
                                                    #5  0x000000000040543a _start (glusterfsd)

                                                    Stack trace of thread 1813:
                                                    #0  0x00007f1b63554300 __nanosleep (libpthread.so.0)
                                                    #1  0x00007f1b647a04e5 gf_timer_proc (libglusterfs.so.0)
                                                    #2  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #3  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1817:
                                                    #0  0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0)
                                                    #1  0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
                                                    #2  0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1966:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1968:
                                                    #0  0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0)
                                                    #1  0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
                                                    #2  0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
                                                    #3  0x00007f1b5e357cde afr_selfheal_unlocked_discover_on (replicate.so)
                                                    #4  0x00007f1b5e357d80 afr_selfheal_unlocked_discover (replicate.so)
                                                    #5  0x00007f1b5e363bf8 __afr_selfheal_entry_prepare (replicate.so)
                                                    #6  0x00007f1b5e3641c0 afr_selfheal_entry_dirent (replicate.so)
                                                    #7  0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
                                                    #8  0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
                                                    #9  0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
                                                    #10 0x00007f1b5e365bba afr_selfheal_entry (replicate.so)
                                                    #11 0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
                                                    #12 0x00007f1b5e35d346 afr_selfheal (replicate.so)
                                                    #13 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
                                                    #14 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
                                                    #15 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0)
                                                    #16 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
                                                    #17 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
                                                    #18 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
                                                    #19 0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #20 0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1970:
                                                    #0  0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2<mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0)
                                                    #1  0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
                                                    #2  0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
                                                    #3  0x00007f1b5e357742 afr_selfheal_unlocked_lookup_on (replicate.so)
                                                    #4  0x00007f1b5e364204 afr_selfheal_entry_dirent (replicate.so)
                                                    #5  0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
                                                    #6  0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
                                                    #7  0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
                                                    #8  0x00007f1b5e365bba afr_selfheal_entry (replicate.so)
                                                    #9  0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
                                                    #10 0x00007f1b5e35d346 afr_selfheal (replicate.so)
                                                    #11 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
                                                    #12 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
                                                    #13 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0)
                                                    #14 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
                                                    #15 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
                                                    #16 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
                                                    #17 0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #18 0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1974:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1976:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1814:
                                                    #0  0x00007f1b62d5fcbc __sigtimedwait (libc.so.6)
                                                    #1  0x00007f1b63554afc sigwait (libpthread.so.0)
                                                    #2  0x0000000000409ed7 glusterfs_sigwaiter (glusterfsd)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1815:
                                                    #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
                                                    #1  0x00007f1b62dee38a sleep (libc.so.6)
                                                    #2  0x00007f1b647c3f5c pool_sweeper (libglusterfs.so.0)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1816:
                                                   #0  0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2<mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0)
                                                    #1  0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
                                                    #2  0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0)
                                                    #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #4  0x00007f1b62e20cbf __clone (libc.so.6)

                                                    Stack trace of thread 1828:
                                                    #0  0x00007f1b62e20fe7 epoll_wait (libc.so.6)
                                                    #1  0x00007f1b647fe855 event_dispatch_epoll_worker (libglusterfs.so.0)
                                                    #2  0x00007f1b6354a5da start_thread (libpthread.so.0)
                                                    #3  0x00007f1b62e20cbf __clone (libc.so.6)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/4b8fe740/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 14795 bytes
Desc: image002.jpg
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/4b8fe740/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 58866 bytes
Desc: image004.jpg
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/4b8fe740/attachment-0003.jpg>