[Gluster-users] glustershd coredump generated while reboot all 3 sn nodes

Ravishankar N ravishankar at redhat.com
Tue Oct 23 12:32:37 UTC 2018


Hi,

Sorry, I haven't gotten a chance to look at the Bug your observations 
yet as I am held up with other things. Will get to this soon. Thanks for 
your patience.

-Ravi


On 10/23/2018 01:35 PM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:
>
> I did some further study of this issue, I think if frame 
> *0x7f84740116e0 *get freed but still kept in rpc_clnt 
> saved_frame_list, this issue is possible to happen. because when frame 
> destroy it is put to hot list ,and most likely to be reused next time, 
> but by the next time it got used, its ret address will be changed, and 
> when previous request’s response returned, it still could retrieve 
> this changed frame, which is wrong!
>
> I find that when FRAME_DESTROY, it does not do anything to rpc_clnt 
> saved_frame_list(actually when free frame, it should not be in 
> saved_frame_list), can we add check like checking every element in 
> saved_frame_list to make sure no frame(to be destroyed) is in the 
> saved_frame_list ?
>
> Looking forward for your reply!
>
> *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Friday, October 19, 2018 9:59 AM
> *To:* 'Ravishankar N' <ravishankar at redhat.com>
> *Cc:* 'gluster-users' <gluster-users at gluster.org>
> *Subject:* RE: glustershd coredump generated while reboot all 3 sn nodes
>
> Hi,
>
> From one coredump recently I got two interesting thread call back 
> trace, from which seems glustershd has two thread polling in message 
> from the same client simultaneously,
>
> Thread 17 (Thread 0x7f8485247700 (LWP 6063)):
>
> #0 0x00007f8489787c80 in pthread_mutex_lock () from /lib64/libpthread.so.0
>
> #1 0x00007f848a9c177e in dict_ref (this=0x18004f0f0) at dict.c:660
>
> #2 0x00007f84845920e4 in afr_selfheal_discover_cbk 
> (frame=*0x7f847400bf00*, *cookie=0x2, this=0x7f84800390b0, op_ret=0, 
> op_errno=0, inode=0x0, *
>
> *    buf=0x7f84740116e0, xdata=0x18004f0f0, parbuf=0x7f8474019e80*) at 
> afr-self-heal-common.c:1723
>
> #3 0x00007f848480b96d in client3_3_entrylk_cbk (req=0x7f8474019e40, 
> iov=0x7f8474019e80, count=1, myframe=*0x7f84740116e0*) at 
> client-rpc-fops.c:1611
>
> #4 0x00007f848a78ed47 in rpc_clnt_handle_reply (clnt=*0x7f848004f0c0*, 
> pollin=0x7f84800bd6d0) at rpc-clnt.c:778
>
> #5 0x00007f848a78f2e5 in rpc_clnt_notify (trans=0x7f848004f2f0, 
> mydata=0x7f848004f0f0, event=RPC_TRANSPORT_MSG_RECEIVED, 
> data=0x7f84800bd6d0)
>
> at rpc-clnt.c:971
>
> #6 0x00007f848a78b319 in rpc_transport_notify (this=0x7f848004f2f0, 
> event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f84800bd6d0) at 
> rpc-transport.c:538
>
> #7 0x00007f84856d234d in socket_event_poll_in (this=0x7f848004f2f0, 
> notify_handled=_gf_true) at socket.c:2315
>
> #8 0x00007f84856d2992 in socket_event_handler (fd=15, idx=8, gen=1, 
> data=0x7f848004f2f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #9 0x00007f848aa395ac in event_dispatch_epoll_handler 
> (event_pool=0x1d40b00, event=0x7f8485246e84) at event-epoll.c:583
>
> #10 0x00007f848aa39883 in event_dispatch_epoll_worker (data=0x1d883d0) 
> at event-epoll.c:659
>
> #11 0x00007f84897855da in start_thread () from /lib64/libpthread.so.0
>
> #12 0x00007f848905bcbf in clone () from /lib64/libc.so.6
>
> And
>
> Thread 1 (Thread 0x7f847f6ff700 (LWP 6083)):
>
> #0 0x00007f8484812d24 in client3_3_lookup_cbk (req=0x7f8474002300, 
> iov=0x7f8474002340, count=1, myframe=0x7f84740116e0) at 
> client-rpc-fops.c:2802
>
> #1 0x00007f848a78ed47 in rpc_clnt_handle_reply (clnt=*0x7f848004f0c0*, 
> pollin=0x7f847800d9f0) at rpc-clnt.c:778
>
> #2 0x00007f848a78f2e5 in rpc_clnt_notify (trans=0x7f848004f2f0, 
> mydata=0x7f848004f0f0, event=RPC_TRANSPORT_MSG_RECEIVED, 
> data=0x7f847800d9f0)
>
> at rpc-clnt.c:971
>
> #3 0x00007f848a78b319 in rpc_transport_notify (this=0x7f848004f2f0, 
> event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f847800d9f0) at 
> rpc-transport.c:538
>
> #4 0x00007f84856d234d in socket_event_poll_in (this=0x7f848004f2f0, 
> notify_handled=_gf_true) at socket.c:2315
>
> #5 0x00007f84856d2992 in socket_event_handler (fd=15, idx=8, gen=1, 
> data=0x7f848004f2f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #6 0x00007f848aa395ac in event_dispatch_epoll_handler 
> (event_pool=0x1d40b00, event=0x7f847f6fee84) at event-epoll.c:583
>
> #7 0x00007f848aa39883 in event_dispatch_epoll_worker 
> (data=0x7f848004ef30) at event-epoll.c:659
>
> #8 0x00007f84897855da in start_thread () from /lib64/libpthread.so.0
>
> #9 0x00007f848905bcbf in clone () from /lib64/libc.so.6
>
> Coredump generate because theread 1 myframe->local is 0
>
> (gdb) print *(struct _call_frame*)myframe
>
> $12 = {root = 0x7f8474008180, parent = 0x7f847400bf00, frames = {next 
> = 0x7f8474009230, prev = 0x7f8474008878}, *local = 0x0*,
>
>   this = 0x7f8480036e20, ret = 0x7f8484591e93 
> <afr_selfheal_discover_cbk>, ref_count = 0, lock = {spinlock = 0, 
> mutex = {__data = {
>
> __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, 
> __spins = 0, __elision = 0, __list = {__prev = 0x0,
>
>           __next = 0x0}}, __size = '\000' <repeats 39 times>, __align 
> = 0}}, cookie = 0x2, complete = _gf_true, op = GF_FOP_NULL,
>
>   begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0},
>
>   wind_from = 0x7f84845cba60 <__FUNCTION__.18726> 
> "afr_selfheal_unlocked_discover_on",
>
>   wind_to = 0x7f84845cb090 "__priv->children[__i]->fops->lookup",
>
>   unwind_from = 0x7f848483a350 <__FUNCTION__.18496> 
> "client3_3_entrylk_cbk", unwind_to = 0x7f84845cb0b4 
> "afr_selfheal_discover_cbk"}
>
> [Analysis]
>
> It seems thread 17 is receiving a msg reply and it get call frame 
> *0x7f84740116e0 and*called client3_3_entrylk_cbk, *but* from source 
> code, when client3_3_entrylk_cbk do unwind
>
> Even if it could find correct ret address, the param passed to it 
> should not be as the params high-lighted in green colour!!
>
> Another weird thing is that when rpc_clnt_handle_reply find frame 
> *0x7f84740116e0, *it should be removed from list, why thread 1 could 
> retrieve this from again??
>
> I checked that when client3_3_lookup_cbk do the unwind the param 
> passed to parent frame is as the param high-lighted in green colour.
>
> *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Tuesday, October 16, 2018 5:24 PM
> *To:* Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>>
> *Cc:* gluster-users <gluster-users at gluster.org 
> <mailto:gluster-users at gluster.org>>
> *Subject:* RE: glustershd coredump generated while reboot all 3 sn nodes
>
> By the way not any private patch applied, 
> https://bugzilla.redhat.com/show_bug.cgi?id=1639632 is created to 
> follow this issue, I enclose one coredump in the bug,
>
> There is not much useful info from glustershd log, because this 
> process coredump suddenly the log only show prints several seconds before.
>
> cynthia
>
> *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Tuesday, October 16, 2018 2:15 PM
> *To:* Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>>
> *Cc:* gluster-users <gluster-users at gluster.org 
> <mailto:gluster-users at gluster.org>>
> *Subject:* RE: glustershd coredump generated while reboot all 3 sn nodes
>
> Hi,
>
> Yes it is glusterfs3.12.3
>
> I will create BZ and attach related coredump and glusterfs log
>
> cynthia
>
> *From:*Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>>
> *Sent:* Tuesday, October 16, 2018 12:23 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com 
> <mailto:cynthia.zhou at nokia-sbell.com>>
> *Cc:* gluster-users <gluster-users at gluster.org 
> <mailto:gluster-users at gluster.org>>
> *Subject:* Re: glustershd coredump generated while reboot all 3 sn nodes
>
> Hi,
>
> - Is this stock glusterfs-3.12.3? Or do you have any patches applied 
> on top of it?
>
> - If it is stock, could you create a BZ and attach the core file and 
> the /var/log/glusterfs/ logs from 3 nodes at the time of crash?
>
> Thanks,
> Ravi
>
> On 10/16/2018 08:45 AM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:
>
>     Hi,
>
>     This issue happened twice recently, when glustershd do heal, it
>     generate coredump occassinally,
>
>     I do some debug and find that sometimes
>     afr_selfheal_unlocked_discover_on do lookup and saved the frame in
>     function rpc_clnt_submit, when reply comes, it find the saved
>     frame , but the address is different from the saved frame address,
>     I think this is wrong, but I can not find a clue how this happened?
>
>     [root at mn-0:/home/robot]
>
>     [Thread debugging using libthread_db enabled]
>
>     Using host libthread_db library "/lib64/libthread_db.so.1".
>
>     Core was generated by `/usr/sbin/glusterfs -s sn-0.local
>     --volfile-id gluster/glustershd -p /var/run/g'.
>
>     Program terminated with signal SIGSEGV, Segmentation fault.
>
>     #0  0x00007fb1a6fd9d24 in client3_3_lookup_cbk
>     (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1,
>     myframe=*0x7fb188215740*) at client-rpc-fops.c:2802
>
>     2802 client-rpc-fops.c: No such file or directory.
>
>     [Current thread is 1 (Thread 0x7fb1a7a0e700 (LWP 8151))]
>
>     Missing separate debuginfos, use: dnf debuginfo-install
>     rcp-pack-glusterfs-1.2.0-RCP2.wf29.x86_64
>
>     (gdb) bt
>
>     #0  0x00007fb1a6fd9d24 in client3_3_lookup_cbk
>     (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1,
>     myframe=0x7fb188215740) at client-rpc-fops.c:2802
>
>     #1  0x00007fb1acf55d47 in rpc_clnt_handle_reply
>     (clnt=0x7fb1a008fff0, pollin=0x7fb1a0843910) at rpc-clnt.c:778
>
>     #2  0x00007fb1acf562e5 in rpc_clnt_notify (trans=0x7fb1a00901c0,
>     mydata=0x7fb1a0090020, event=RPC_TRANSPORT_MSG_RECEIVED,
>     data=0x7fb1a0843910) at rpc-clnt.c:971
>
>     #3  0x00007fb1acf52319 in rpc_transport_notify
>     (this=0x7fb1a00901c0, event=RPC_TRANSPORT_MSG_RECEIVED,
>     data=0x7fb1a0843910) at rpc-transport.c:538
>
>     #4  0x00007fb1a7e9934d in socket_event_poll_in
>     (this=0x7fb1a00901c0, notify_handled=_gf_true) at socket.c:2315
>
>     #5  0x00007fb1a7e99992 in socket_event_handler (fd=20, idx=14,
>     gen=103, data=0x7fb1a00901c0, poll_in=1, poll_out=0, poll_err=0)
>     at socket.c:2471
>
>     #6  0x00007fb1ad2005ac in event_dispatch_epoll_handler
>     (event_pool=0x175fb00, event=0x7fb1a7a0de84) at event-epoll.c:583
>
>     #7  0x00007fb1ad200883 in event_dispatch_epoll_worker
>     (data=0x17a73d0) at event-epoll.c:659
>
>     #8  0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0
>
>     #9  0x00007fb1ab822cbf in clone () from /lib64/libc.so.6
>
>     (gdb) info thread
>
>       Id   Target Id         Frame
>
>     * 1    Thread 0x7fb1a7a0e700 (LWP 8151) 0x00007fb1a6fd9d24 in
>     client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0,
>     count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802
>
>       2    Thread 0x7fb1aa0af700 (LWP 8147) 0x00007fb1ab761cbc in
>     sigtimedwait () from /lib64/libc.so.6
>
>       3    Thread 0x7fb1a98ae700 (LWP 8148) 0x00007fb1ab7f04b0 in
>     nanosleep () from /lib64/libc.so.6
>
>       4    Thread 0x7fb1957fa700 (LWP 8266) 0x00007fb1abf528ca in
>     pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
>     /lib64/libpthread.so.0
>
>       5    Thread 0x7fb1a88ac700 (LWP 8150) 0x00007fb1abf528ca in
>     pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
>     /lib64/libpthread.so.0
>
>       6    Thread 0x7fb17f7fe700 (LWP 8269) 0x00007fb1abf5250c in
>     pthread_cond_wait@@GLIBC_2.3.2
>     <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
>       7    Thread 0x7fb1aa8b0700 (LWP 8146) 0x00007fb1abf56300 in
>     nanosleep () from /lib64/libpthread.so.0
>
>       8    Thread 0x7fb1ad685780 (LWP 8145) 0x00007fb1abf4da3d in
>     __pthread_timedjoin_ex () from /lib64/libpthread.so.0
>
>       9    Thread 0x7fb1a542d700 (LWP 8251) 0x00007fb1ab7f04b0 in
>     nanosleep () from /lib64/libc.so.6
>
>       10   Thread 0x7fb1a4c2c700 (LWP 8260) 0x00007fb1abf528ca in
>     pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
>     /lib64/libpthread.so.0
>
>       11   Thread 0x7fb196ffd700 (LWP 8263) 0x00007fb1abf528ca in
>     pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
>     /lib64/libpthread.so.0
>
>       12   Thread 0x7fb1a60d7700 (LWP 8247) 0x00007fb1ab822fe7 in
>     epoll_wait () from /lib64/libc.so.6
>
>       13   Thread 0x7fb1a90ad700 (LWP 8149) 0x00007fb1abf528ca in
>     pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
>     /lib64/libpthread.so.0
>
>     (gdb) print (call_frame_t*)myframe
>
>     $1 = (call_frame_t *) 0x7fb188215740
>
>     (gdb) print *(call_frame_t*)myframe
>
>     $2 = {root = 0x7fb1a0085090, parent = 0xcd4642c4a3efd678, frames =
>     {next = 0x151e2a92a5ae1bb, prev = 0x0}, *local = 0x0, this = 0x0,
>     ret = 0x0*, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {
>
>             __lock = 0, __count = 0, __owner = 0, __nusers = 4, __kind
>     = 0, __spins = 0, __elision = 0, __list = {__prev =
>     0x7fb188215798, __next = 0x7fb188215798}},
>
>           __size = '\000' <repeats 12 times>, "\004", '\000' <repeats
>     11 times>, "\230W!\210\261\177\000\000\230W!\210\261\177\000",
>     __align = 0}}, cookie = 0x7fb1882157a8, complete = (unknown:
>     2283886504),
>
>       op = 32689, begin = {tv_sec = 140400469825464, tv_usec =
>     140400469825464}, end = {tv_sec = 140400878737576, tv_usec =
>     140400132101048}, wind_from = 0x7fb18801cdc0 "", wind_to = 0x0,
>     unwind_from = 0x0,
>
>       unwind_to = 0x0}
>
>     (gdb) thread 6
>
>     [Switching to thread 6 (Thread 0x7fb17f7fe700 (LWP 8269))]
>
>     #0  0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2
>     <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
>     (gdb) bt
>
>     #0  0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2
>     <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
>     #1  0x00007fb1ad1dc993 in __syncbarrier_wait
>     (barrier=0x7fb188014790, waitfor=3) at syncop.c:1138
>
>     #2  0x00007fb1ad1dc9e4 in syncbarrier_wait
>     (barrier=0x7fb188014790, waitfor=3) at syncop.c:1155
>
>     #3  0x00007fb1a6d59cde in afr_selfheal_unlocked_discover_on
>     (*frame=0x7fb1882162d0*, inode=0x7fb188215740, gfid=0x7fb17f7fdb00
>     "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
>
>         replies=0x7fb17f7fcf40, discover_on=0x7fb1a0084cb0
>     "\001\001\001", <incomplete sequence \360\255\272>) at
>     afr-self-heal-common.c:1809
>
>     #4  0x00007fb1a6d59d80 in afr_selfheal_unlocked_discover
>     (*frame=0x7fb1882162d0*, inode=0x7fb188215740, gfid=0x7fb17f7fdb00
>     "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
>
>         replies=0x7fb17f7fcf40) at afr-self-heal-common.c:1828
>
>     #5  0x00007fb1a6d5e51f in afr_selfheal_unlocked_inspect
>     (frame=0x7fb1882162d0, this=0x7fb1a001db40, gfid=0x7fb17f7fdb00
>     "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
>
>         link_inode=0x7fb17f7fd9c8, data_selfheal=0x7fb17f7fd9c4,
>     metadata_selfheal=0x7fb17f7fd9c0, entry_selfheal=0x7fb17f7fd9bc)
>     at afr-self-heal-common.c:2241
>
>     #6  0x00007fb1a6d5f19b in afr_selfheal_do (frame=0x7fb1882162d0,
>     this=0x7fb1a001db40, gfid=0x7fb17f7fdb00
>     "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177")
>     at afr-self-heal-common.c:2483
>
>     #7  0x00007fb1a6d5f346 in afr_selfheal (this=0x7fb1a001db40,
>     gfid=0x7fb17f7fdb00
>     "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177")
>     at afr-self-heal-common.c:2543
>
>     #8  0x00007fb1a6d6ac5c in afr_shd_selfheal (healer=0x7fb1a0085640,
>     child=0, gfid=0x7fb17f7fdb00
>     "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177")
>     at afr-self-heald.c:343
>
>     #9  0x00007fb1a6d6b00b in afr_shd_index_heal
>     (subvol=0x7fb1a00171e0, entry=0x7fb1a0714180,
>     parent=0x7fb17f7fddc0, data=0x7fb1a0085640) at afr-self-heald.c:440
>
>     #10 0x00007fb1ad201ed3 in syncop_mt_dir_scan
>     (frame=0x7fb1a07a0e90, subvol=0x7fb1a00171e0, loc=0x7fb17f7fddc0,
>     pid=-6, data=0x7fb1a0085640, fn=0x7fb1a6d6aebc
>     <afr_shd_index_heal>, xdata=0x7fb1a07b4ed0,
>
>         max_jobs=1, max_qlen=1024) at syncop-utils.c:407
>
>     #11 0x00007fb1a6d6b2b5 in afr_shd_index_sweep
>     (healer=0x7fb1a0085640, vgfid=0x7fb1a6d93610
>     "glusterfs.xattrop_index_gfid") at afr-self-heald.c:494
>
>     #12 0x00007fb1a6d6b394 in afr_shd_index_sweep_all
>     (healer=0x7fb1a0085640) at afr-self-heald.c:517
>
>     #13 0x00007fb1a6d6b697 in afr_shd_index_healer
>     (data=0x7fb1a0085640) at afr-self-heald.c:597
>
>     #14 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0
>
>     #15 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6
>
>     *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
>     *Sent:* Thursday, October 11, 2018 3:36 PM
>     *To:* Ravishankar N <ravishankar at redhat.com>
>     <mailto:ravishankar at redhat.com>
>     *Cc:* gluster-users <gluster-users at gluster.org>
>     <mailto:gluster-users at gluster.org>
>     *Subject:* glustershd coredump generated while reboot all 3 sn nodes
>
>     Hi,
>
>     I find that when restart sn node sometimes, the glustershd will
>     exit and generate coredump. It has happened twice in my env, I
>     would like to know your opinion on this issue, thanks!
>
>     The glusterfs version I use is glusterfs3.12.3
>
>     [root at sn-1:/root]
>
>     # gluster v info log
>
>     Volume Name: log
>
>     Type: Replicate
>
>     Volume ID: 87bcbaf8-5fa4-4060-9149-23f832befe92
>
>     Status: Started
>
>     Snapshot Count: 0
>
>     Number of Bricks: 1 x 3 = 3
>
>     Transport-type: tcp
>
>     Bricks:
>
>     Brick1: sn-0.local:/mnt/bricks/log/brick
>
>     Brick2: sn-1.local:/mnt/bricks/log/brick
>
>     Brick3: sn-2.local:/mnt/bricks/log/brick
>
>     Options Reconfigured:
>
>     server.allow-insecure: on
>
>     cluster.quorum-type: auto
>
>     network.ping-timeout: 42
>
>     cluster.consistent-metadata: on
>
>     cluster.favorite-child-policy: mtime
>
>     cluster.quorum-reads: no
>
>     cluster.server-quorum-type: none
>
>     transport.address-family: inet
>
>     nfs.disable: on
>
>     performance.client-io-threads: off
>
>     cluster.server-quorum-ratio: 51%
>
>     [root at sn-1:/root]
>
>     ///////////////////////////////////////////////glustershd
>     coredump////////////////////////////////////////////////////////////////
>
>     # lz4 -d
>     core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000.lz4
>
>     Decoding file
>     core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
>
>
>     core.glusterfs.0.c5f : decoded 263188480 bytes
>
>     [root at sn-0:/mnt/export]
>
>     # gdb /usr/sbin/glusterfs
>     core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
>
>     GNU gdb (GDB) Fedora 8.1-14.wf29
>
>     Copyright (C) 2018 Free Software Foundation, Inc.
>
>     License GPLv3+: GNU GPL version 3 or later
>     <http://gnu.org/licenses/gpl.html>
>
>     This is free software: you are free to change and redistribute it.
>
>     There is NO WARRANTY, to the extent permitted by law.  Type "show
>     copying"
>
>     and "show warranty" for details.
>
>     This GDB was configured as "x86_64-redhat-linux-gnu".
>
>     Type "show configuration" for configuration details.
>
>     For bug reporting instructions, please see:
>
>     <http://www.gnu.org/software/gdb/bugs/>.
>
>     Find the GDB manual and other documentation resources online at:
>
>     <http://www.gnu.org/software/gdb/documentation/>.
>
>     For help, type "help".
>
>     Type "apropos word" to search for commands related to "word"...
>
>     Reading symbols from /usr/sbin/glusterfs...(no debugging symbols
>     found)...done.
>
>     warning: core file may not match specified executable file.
>
>     [New LWP 1818]
>
>     [New LWP 1812]
>
>     [New LWP 1813]
>
>     [New LWP 1817]
>
>     [New LWP 1966]
>
>     [New LWP 1968]
>
>     [New LWP 1970]
>
>     [New LWP 1974]
>
>     [New LWP 1976]
>
>     [New LWP 1814]
>
>     [New LWP 1815]
>
>     [New LWP 1816]
>
>     [New LWP 1828]
>
>     [Thread debugging using libthread_db enabled]
>
>     Using host libthread_db library "/lib64/libthread_db.so.1".
>
>     Core was generated by `/usr/sbin/glusterfs -s sn-0.local
>     --volfile-id gluster/glustershd -p /var/run/g'.
>
>     Program terminated with signal SIGSEGV, Segmentation fault.
>
>     #0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk
>     (req=0x7f1b44002300, iov=0x7f1b44002340, count=1,
>     myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
>
>     2802 client-rpc-fops.c: No such file or directory.
>
>     [Current thread is 1 (Thread 0x7f1b5f00c700 (LWP 1818))]
>
>     Missing separate debuginfos, use: dnf debuginfo-install
>     rcp-pack-glusterfs-1.2.0_1_g54e6196-RCP2.wf29.x86_64
>
>     (gdb) bt
>
>     #0  0x00007f1b5e5d7d24 in client3_3_lookup_cbk
>     (req=0x7f1b44002300, iov=0x7f1b44002340, count=1,
>     myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
>
>     #1  0x00007f1b64553d47 in rpc_clnt_handle_reply
>     (clnt=0x7f1b5808bbb0, pollin=0x7f1b580c6620) at rpc-clnt.c:778
>
>     #2  0x00007f1b645542e5 in rpc_clnt_notify (trans=0x7f1b5808bde0,
>     mydata=0x7f1b5808bbe0, event=RPC_TRANSPORT_MSG_RECEIVED,
>     data=0x7f1b580c6620) at rpc-clnt.c:971
>
>     #3  0x00007f1b64550319 in rpc_transport_notify
>     (this=0x7f1b5808bde0, event=RPC_TRANSPORT_MSG_RECEIVED,
>     data=0x7f1b580c6620) at rpc-transport.c:538
>
>     #4  0x00007f1b5f49734d in socket_event_poll_in
>     (this=0x7f1b5808bde0, notify_handled=_gf_true) at socket.c:2315
>
>     #5  0x00007f1b5f497992 in socket_event_handler (fd=25, idx=15,
>     gen=7, data=0x7f1b5808bde0, poll_in=1, poll_out=0, poll_err=0) at
>     socket.c:2471
>
>     #6  0x00007f1b647fe5ac in event_dispatch_epoll_handler
>     (event_pool=0x230cb00, event=0x7f1b5f00be84) at event-epoll.c:583
>
>     #7  0x00007f1b647fe883 in event_dispatch_epoll_worker
>     (data=0x23543d0) at event-epoll.c:659
>
>     #8  0x00007f1b6354a5da in start_thread () from /lib64/libpthread.so.0
>
>     #9  0x00007f1b62e20cbf in clone () from /lib64/libc.so.6
>
>     *(gdb) print *(call_frame_t*)myframe*
>
>     *$1 = {root = 0x100000000, parent = 0x100000005, frames = {next =
>     0x7f1b4401c8a8, prev = 0x7f1b44010190}, **local = 0x0**, this =
>     0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex =
>     {__data = {*
>
>     *__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
>     __spins = 0, __elision = 0, __list = {__prev = 0x7f1b44010190,
>     __next = 0x0}}, *
>
>     *      __size = '\000' <repeats 24 times>,
>     "\220\001\001D\033\177\000\000\000\000\000\000\000\000\000",
>     __align = 0}}, cookie = 0x7f1b4401ccf0, complete = _gf_false, op =
>     GF_FOP_NULL, begin = {*
>
>     *tv_sec = 139755081730912, tv_usec = 139755081785872}, end =
>     {tv_sec = 448811404, tv_usec = 21474836481}, wind_from = 0x0,
>     wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0}*
>
>     (gdb) info thread
>
>       Id   Target Id         Frame
>
>     * 1    Thread 0x7f1b5f00c700 (LWP 1818) 0x00007f1b5e5d7d24 in
>     client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340,
>     count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
>
>       2    Thread 0x7f1b64c83780 (LWP 1812) 0x00007f1b6354ba3d in
>     __pthread_timedjoin_ex () from /lib64/libpthread.so.0
>
>       3    Thread 0x7f1b61eae700 (LWP 1813) 0x00007f1b63554300 in
>     nanosleep () from /lib64/libpthread.so.0
>
>       4    Thread 0x7f1b5feaa700 (LWP 1817) 0x00007f1b635508ca in
>     pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
>     /lib64/libpthread.so.0
>
>       5    Thread 0x7f1b5ca2b700 (LWP 1966) 0x00007f1b62dee4b0 in
>     nanosleep () from /lib64/libc.so.6
>
>       6    Thread 0x7f1b4f7fe700 (LWP 1968) 0x00007f1b6355050c in
>     pthread_cond_wait@@GLIBC_2.3.2
>     <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
>       7    Thread 0x7f1b4e7fc700 (LWP 1970) 0x00007f1b6355050c in
>     pthread_cond_wait@@GLIBC_2.3.2
>     <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
>       8    Thread 0x7f1b4d7fa700 (LWP 1974) 0x00007f1b62dee4b0 in
>     nanosleep () from /lib64/libc.so.6
>
>       9    Thread 0x7f1b33fff700 (LWP 1976) 0x00007f1b62dee4b0 in
>     nanosleep () from /lib64/libc.so.6
>
>       10   Thread 0x7f1b616ad700 (LWP 1814) 0x00007f1b62d5fcbc in
>     sigtimedwait () from /lib64/libc.so.6
>
>       11   Thread 0x7f1b60eac700 (LWP 1815) 0x00007f1b62dee4b0 in
>     nanosleep () from /lib64/libc.so.6
>
>       12   Thread 0x7f1b606ab700 (LWP 1816) 0x00007f1b635508ca in
>     pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
>     /lib64/libpthread.so.0
>
>       13   Thread 0x7f1b5d6d5700 (LWP 1828) 0x00007f1b62e20fe7 in
>     epoll_wait () from /lib64/libc.so.6
>
>     (gdb) quit
>
>     The source code is like this, so from gdb it coredump because
>     frame->local is *NULL*!!
>
>     From sn-0 journal log,
>
>     Sep 26 16:04:40.034577 sn-0 systemd-coredump[2612]: Process 1812
>     (glusterfs) of user 0 dumped core.
>
>                                                         Stack trace of
>     thread 1818:
>
>     #0  0x00007f1b5e5d7d24 client3_3_lookup_cbk (client.so)
>
>     #1  0x00007f1b64553d47 rpc_clnt_handle_reply (libgfrpc.so.0)
>
>     #2  0x00007f1b645542e5 rpc_clnt_notify (libgfrpc.so.0)
>
>     #3  0x00007f1b64550319 rpc_transport_notify (libgfrpc.so.0)
>
>     #4  0x00007f1b5f49734d socket_event_poll_in (socket.so)
>
>                                      #5  0x00007f1b5f497992
>     socket_event_handler (socket.so)
>
>     #6  0x00007f1b647fe5ac event_dispatch_epoll_handler
>     (libglusterfs.so.0)
>
>              #7  0x00007f1b647fe883 event_dispatch_epoll_worker
>     (libglusterfs.so.0)
>
>     #8  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #9  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1812:
>
>     #0  0x00007f1b6354ba3d __GI___pthread_timedjoin_ex (libpthread.so.0)
>
>     #1  0x00007f1b647feae1 event_dispatch_epoll (libglusterfs.so.0)
>
>                                      #2  0x00007f1b647c2703
>     event_dispatch (libglusterfs.so.0)
>
>     #3  0x000000000040ab95 main (glusterfsd)
>
>     #4  0x00007f1b62d4baf7 __libc_start_main (libc.so.6)
>
>     #5  0x000000000040543a _start (glusterfsd)
>
>                                                         Stack trace of
>     thread 1813:
>
>     #0  0x00007f1b63554300 __nanosleep (libpthread.so.0)
>
>     #1  0x00007f1b647a04e5 gf_timer_proc (libglusterfs.so.0)
>
>                    #2  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #3  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1817:
>
>     #0  0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0)
>
>     #1  0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
>
>     #2  0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0)
>
>     #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>                                          #4 0x00007f1b62e20cbf __clone
>     (libc.so.6)
>
>                                                         Stack trace of
>     thread 1966:
>
>              #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
>     #1  0x00007f1b62dee38a sleep (libc.so.6)
>
>     #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
>
>     #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #4  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1968:
>
>     #0  0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2
>     <mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0)
>
>     #1  0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
>
>     #2  0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
>
>                           #3  0x00007f1b5e357cde
>     afr_selfheal_unlocked_discover_on (replicate.so)
>
>     #4  0x00007f1b5e357d80 afr_selfheal_unlocked_discover (replicate.so)
>
>                #5  0x00007f1b5e363bf8 __afr_selfheal_entry_prepare
>     (replicate.so)
>
>     #6  0x00007f1b5e3641c0 afr_selfheal_entry_dirent (replicate.so)
>
>     #7  0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
>
>     #8  0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
>
>     #9  0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
>
>     #10 0x00007f1b5e365bba afr_selfheal_entry (replicate.so)
>
>     #11 0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
>
>                                        #12 0x00007f1b5e35d346
>     afr_selfheal (replicate.so)
>
>     #13 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
>
>     #14 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
>
>     #15 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0)
>
>     #16 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
>
>     #17 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
>
>     #18 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
>
>                                        #19 0x00007f1b6354a5da
>     start_thread (libpthread.so.0)
>
>     #20 0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1970:
>
>     #0  0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2
>     <mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0)
>
>     #1  0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
>
>     #2  0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
>
>     #3  0x00007f1b5e357742 afr_selfheal_unlocked_lookup_on (replicate.so)
>
>     #4  0x00007f1b5e364204 afr_selfheal_entry_dirent (replicate.so)
>
>     #5  0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
>
>     #6  0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
>
>     #7  0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
>
>                        #8  0x00007f1b5e365bba afr_selfheal_entry
>     (replicate.so)
>
>     #9  0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
>
>     #10 0x00007f1b5e35d346 afr_selfheal (replicate.so)
>
>     #11 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
>
>     #12 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
>
>                                              #13 0x00007f1b647ffed3
>     syncop_mt_dir_scan (libglusterfs.so.0)
>
>     #14 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
>
>              #15 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
>
>     #16 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
>
>     #17 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #18 0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1974:
>
>     #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
>     #1  0x00007f1b62dee38a sleep (libc.so.6)
>
>     #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
>
>     #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #4  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1976:
>
>     #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
>                             #1  0x00007f1b62dee38a sleep (libc.so.6)
>
>     #2  0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
>
>     #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #4  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1814:
>
>     #0  0x00007f1b62d5fcbc __sigtimedwait (libc.so.6)
>
>     #1  0x00007f1b63554afc sigwait (libpthread.so.0)
>
>     #2  0x0000000000409ed7 glusterfs_sigwaiter (glusterfsd)
>
>     #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #4  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1815:
>
>     #0  0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
>                                 #1  0x00007f1b62dee38a sleep (libc.so.6)
>
>     #2  0x00007f1b647c3f5c pool_sweeper (libglusterfs.so.0)
>
>     #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #4  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1816:
>
>                                                        #0
>     0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2
>     <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0)
>
>     #1  0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
>
>                                     #2  0x00007f1b647d9b7e
>     syncenv_processor (libglusterfs.so.0)
>
>     #3  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>     #4  0x00007f1b62e20cbf __clone (libc.so.6)
>
>                                                         Stack trace of
>     thread 1828:
>
>     #0  0x00007f1b62e20fe7 epoll_wait (libc.so.6)
>
>     #1  0x00007f1b647fe855 event_dispatch_epoll_worker (libglusterfs.so.0)
>
>     #2  0x00007f1b6354a5da start_thread (libpthread.so.0)
>
>                                           #3 0x00007f1b62e20cbf
>     __clone (libc.so.6)
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/ace69897/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 14795 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/ace69897/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 58866 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/ace69897/attachment-0001.jpg>


More information about the Gluster-users mailing list