[Gluster-users] glustershd coredump generated while reboot all 3 sn nodes
Ravishankar N
ravishankar at redhat.com
Tue Oct 23 12:32:37 UTC 2018
Hi,
Sorry, I haven't gotten a chance to look at the Bug your observations
yet as I am held up with other things. Will get to this soon. Thanks for
your patience.
-Ravi
On 10/23/2018 01:35 PM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:
>
> I did some further study of this issue, I think if frame
> *0x7f84740116e0 *get freed but still kept in rpc_clnt
> saved_frame_list, this issue is possible to happen. because when frame
> destroy it is put to hot list ,and most likely to be reused next time,
> but by the next time it got used, its ret address will be changed, and
> when previous request’s response returned, it still could retrieve
> this changed frame, which is wrong!
>
> I find that when FRAME_DESTROY, it does not do anything to rpc_clnt
> saved_frame_list(actually when free frame, it should not be in
> saved_frame_list), can we add check like checking every element in
> saved_frame_list to make sure no frame(to be destroyed) is in the
> saved_frame_list ?
>
> Looking forward for your reply!
>
> *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Friday, October 19, 2018 9:59 AM
> *To:* 'Ravishankar N' <ravishankar at redhat.com>
> *Cc:* 'gluster-users' <gluster-users at gluster.org>
> *Subject:* RE: glustershd coredump generated while reboot all 3 sn nodes
>
> Hi,
>
> From one coredump recently I got two interesting thread call back
> trace, from which seems glustershd has two thread polling in message
> from the same client simultaneously,
>
> Thread 17 (Thread 0x7f8485247700 (LWP 6063)):
>
> #0 0x00007f8489787c80 in pthread_mutex_lock () from /lib64/libpthread.so.0
>
> #1 0x00007f848a9c177e in dict_ref (this=0x18004f0f0) at dict.c:660
>
> #2 0x00007f84845920e4 in afr_selfheal_discover_cbk
> (frame=*0x7f847400bf00*, *cookie=0x2, this=0x7f84800390b0, op_ret=0,
> op_errno=0, inode=0x0, *
>
> * buf=0x7f84740116e0, xdata=0x18004f0f0, parbuf=0x7f8474019e80*) at
> afr-self-heal-common.c:1723
>
> #3 0x00007f848480b96d in client3_3_entrylk_cbk (req=0x7f8474019e40,
> iov=0x7f8474019e80, count=1, myframe=*0x7f84740116e0*) at
> client-rpc-fops.c:1611
>
> #4 0x00007f848a78ed47 in rpc_clnt_handle_reply (clnt=*0x7f848004f0c0*,
> pollin=0x7f84800bd6d0) at rpc-clnt.c:778
>
> #5 0x00007f848a78f2e5 in rpc_clnt_notify (trans=0x7f848004f2f0,
> mydata=0x7f848004f0f0, event=RPC_TRANSPORT_MSG_RECEIVED,
> data=0x7f84800bd6d0)
>
> at rpc-clnt.c:971
>
> #6 0x00007f848a78b319 in rpc_transport_notify (this=0x7f848004f2f0,
> event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f84800bd6d0) at
> rpc-transport.c:538
>
> #7 0x00007f84856d234d in socket_event_poll_in (this=0x7f848004f2f0,
> notify_handled=_gf_true) at socket.c:2315
>
> #8 0x00007f84856d2992 in socket_event_handler (fd=15, idx=8, gen=1,
> data=0x7f848004f2f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #9 0x00007f848aa395ac in event_dispatch_epoll_handler
> (event_pool=0x1d40b00, event=0x7f8485246e84) at event-epoll.c:583
>
> #10 0x00007f848aa39883 in event_dispatch_epoll_worker (data=0x1d883d0)
> at event-epoll.c:659
>
> #11 0x00007f84897855da in start_thread () from /lib64/libpthread.so.0
>
> #12 0x00007f848905bcbf in clone () from /lib64/libc.so.6
>
> And
>
> Thread 1 (Thread 0x7f847f6ff700 (LWP 6083)):
>
> #0 0x00007f8484812d24 in client3_3_lookup_cbk (req=0x7f8474002300,
> iov=0x7f8474002340, count=1, myframe=0x7f84740116e0) at
> client-rpc-fops.c:2802
>
> #1 0x00007f848a78ed47 in rpc_clnt_handle_reply (clnt=*0x7f848004f0c0*,
> pollin=0x7f847800d9f0) at rpc-clnt.c:778
>
> #2 0x00007f848a78f2e5 in rpc_clnt_notify (trans=0x7f848004f2f0,
> mydata=0x7f848004f0f0, event=RPC_TRANSPORT_MSG_RECEIVED,
> data=0x7f847800d9f0)
>
> at rpc-clnt.c:971
>
> #3 0x00007f848a78b319 in rpc_transport_notify (this=0x7f848004f2f0,
> event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f847800d9f0) at
> rpc-transport.c:538
>
> #4 0x00007f84856d234d in socket_event_poll_in (this=0x7f848004f2f0,
> notify_handled=_gf_true) at socket.c:2315
>
> #5 0x00007f84856d2992 in socket_event_handler (fd=15, idx=8, gen=1,
> data=0x7f848004f2f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #6 0x00007f848aa395ac in event_dispatch_epoll_handler
> (event_pool=0x1d40b00, event=0x7f847f6fee84) at event-epoll.c:583
>
> #7 0x00007f848aa39883 in event_dispatch_epoll_worker
> (data=0x7f848004ef30) at event-epoll.c:659
>
> #8 0x00007f84897855da in start_thread () from /lib64/libpthread.so.0
>
> #9 0x00007f848905bcbf in clone () from /lib64/libc.so.6
>
> Coredump generate because theread 1 myframe->local is 0
>
> (gdb) print *(struct _call_frame*)myframe
>
> $12 = {root = 0x7f8474008180, parent = 0x7f847400bf00, frames = {next
> = 0x7f8474009230, prev = 0x7f8474008878}, *local = 0x0*,
>
> this = 0x7f8480036e20, ret = 0x7f8484591e93
> <afr_selfheal_discover_cbk>, ref_count = 0, lock = {spinlock = 0,
> mutex = {__data = {
>
> __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
> __spins = 0, __elision = 0, __list = {__prev = 0x0,
>
> __next = 0x0}}, __size = '\000' <repeats 39 times>, __align
> = 0}}, cookie = 0x2, complete = _gf_true, op = GF_FOP_NULL,
>
> begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0},
>
> wind_from = 0x7f84845cba60 <__FUNCTION__.18726>
> "afr_selfheal_unlocked_discover_on",
>
> wind_to = 0x7f84845cb090 "__priv->children[__i]->fops->lookup",
>
> unwind_from = 0x7f848483a350 <__FUNCTION__.18496>
> "client3_3_entrylk_cbk", unwind_to = 0x7f84845cb0b4
> "afr_selfheal_discover_cbk"}
>
> [Analysis]
>
> It seems thread 17 is receiving a msg reply and it get call frame
> *0x7f84740116e0 and*called client3_3_entrylk_cbk, *but* from source
> code, when client3_3_entrylk_cbk do unwind
>
> Even if it could find correct ret address, the param passed to it
> should not be as the params high-lighted in green colour!!
>
> Another weird thing is that when rpc_clnt_handle_reply find frame
> *0x7f84740116e0, *it should be removed from list, why thread 1 could
> retrieve this from again??
>
> I checked that when client3_3_lookup_cbk do the unwind the param
> passed to parent frame is as the param high-lighted in green colour.
>
> *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Tuesday, October 16, 2018 5:24 PM
> *To:* Ravishankar N <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>>
> *Cc:* gluster-users <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Subject:* RE: glustershd coredump generated while reboot all 3 sn nodes
>
> By the way not any private patch applied,
> https://bugzilla.redhat.com/show_bug.cgi?id=1639632 is created to
> follow this issue, I enclose one coredump in the bug,
>
> There is not much useful info from glustershd log, because this
> process coredump suddenly the log only show prints several seconds before.
>
> cynthia
>
> *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Tuesday, October 16, 2018 2:15 PM
> *To:* Ravishankar N <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>>
> *Cc:* gluster-users <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Subject:* RE: glustershd coredump generated while reboot all 3 sn nodes
>
> Hi,
>
> Yes it is glusterfs3.12.3
>
> I will create BZ and attach related coredump and glusterfs log
>
> cynthia
>
> *From:*Ravishankar N <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>>
> *Sent:* Tuesday, October 16, 2018 12:23 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com
> <mailto:cynthia.zhou at nokia-sbell.com>>
> *Cc:* gluster-users <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Subject:* Re: glustershd coredump generated while reboot all 3 sn nodes
>
> Hi,
>
> - Is this stock glusterfs-3.12.3? Or do you have any patches applied
> on top of it?
>
> - If it is stock, could you create a BZ and attach the core file and
> the /var/log/glusterfs/ logs from 3 nodes at the time of crash?
>
> Thanks,
> Ravi
>
> On 10/16/2018 08:45 AM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:
>
> Hi,
>
> This issue happened twice recently, when glustershd do heal, it
> generate coredump occassinally,
>
> I do some debug and find that sometimes
> afr_selfheal_unlocked_discover_on do lookup and saved the frame in
> function rpc_clnt_submit, when reply comes, it find the saved
> frame , but the address is different from the saved frame address,
> I think this is wrong, but I can not find a clue how this happened?
>
> [root at mn-0:/home/robot]
>
> [Thread debugging using libthread_db enabled]
>
> Using host libthread_db library "/lib64/libthread_db.so.1".
>
> Core was generated by `/usr/sbin/glusterfs -s sn-0.local
> --volfile-id gluster/glustershd -p /var/run/g'.
>
> Program terminated with signal SIGSEGV, Segmentation fault.
>
> #0 0x00007fb1a6fd9d24 in client3_3_lookup_cbk
> (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1,
> myframe=*0x7fb188215740*) at client-rpc-fops.c:2802
>
> 2802 client-rpc-fops.c: No such file or directory.
>
> [Current thread is 1 (Thread 0x7fb1a7a0e700 (LWP 8151))]
>
> Missing separate debuginfos, use: dnf debuginfo-install
> rcp-pack-glusterfs-1.2.0-RCP2.wf29.x86_64
>
> (gdb) bt
>
> #0 0x00007fb1a6fd9d24 in client3_3_lookup_cbk
> (req=0x7fb188010fb0, iov=0x7fb188010ff0, count=1,
> myframe=0x7fb188215740) at client-rpc-fops.c:2802
>
> #1 0x00007fb1acf55d47 in rpc_clnt_handle_reply
> (clnt=0x7fb1a008fff0, pollin=0x7fb1a0843910) at rpc-clnt.c:778
>
> #2 0x00007fb1acf562e5 in rpc_clnt_notify (trans=0x7fb1a00901c0,
> mydata=0x7fb1a0090020, event=RPC_TRANSPORT_MSG_RECEIVED,
> data=0x7fb1a0843910) at rpc-clnt.c:971
>
> #3 0x00007fb1acf52319 in rpc_transport_notify
> (this=0x7fb1a00901c0, event=RPC_TRANSPORT_MSG_RECEIVED,
> data=0x7fb1a0843910) at rpc-transport.c:538
>
> #4 0x00007fb1a7e9934d in socket_event_poll_in
> (this=0x7fb1a00901c0, notify_handled=_gf_true) at socket.c:2315
>
> #5 0x00007fb1a7e99992 in socket_event_handler (fd=20, idx=14,
> gen=103, data=0x7fb1a00901c0, poll_in=1, poll_out=0, poll_err=0)
> at socket.c:2471
>
> #6 0x00007fb1ad2005ac in event_dispatch_epoll_handler
> (event_pool=0x175fb00, event=0x7fb1a7a0de84) at event-epoll.c:583
>
> #7 0x00007fb1ad200883 in event_dispatch_epoll_worker
> (data=0x17a73d0) at event-epoll.c:659
>
> #8 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0
>
> #9 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6
>
> (gdb) info thread
>
> Id Target Id Frame
>
> * 1 Thread 0x7fb1a7a0e700 (LWP 8151) 0x00007fb1a6fd9d24 in
> client3_3_lookup_cbk (req=0x7fb188010fb0, iov=0x7fb188010ff0,
> count=1, myframe=0x7fb188215740) at client-rpc-fops.c:2802
>
> 2 Thread 0x7fb1aa0af700 (LWP 8147) 0x00007fb1ab761cbc in
> sigtimedwait () from /lib64/libc.so.6
>
> 3 Thread 0x7fb1a98ae700 (LWP 8148) 0x00007fb1ab7f04b0 in
> nanosleep () from /lib64/libc.so.6
>
> 4 Thread 0x7fb1957fa700 (LWP 8266) 0x00007fb1abf528ca in
> pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
> /lib64/libpthread.so.0
>
> 5 Thread 0x7fb1a88ac700 (LWP 8150) 0x00007fb1abf528ca in
> pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
> /lib64/libpthread.so.0
>
> 6 Thread 0x7fb17f7fe700 (LWP 8269) 0x00007fb1abf5250c in
> pthread_cond_wait@@GLIBC_2.3.2
> <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
> 7 Thread 0x7fb1aa8b0700 (LWP 8146) 0x00007fb1abf56300 in
> nanosleep () from /lib64/libpthread.so.0
>
> 8 Thread 0x7fb1ad685780 (LWP 8145) 0x00007fb1abf4da3d in
> __pthread_timedjoin_ex () from /lib64/libpthread.so.0
>
> 9 Thread 0x7fb1a542d700 (LWP 8251) 0x00007fb1ab7f04b0 in
> nanosleep () from /lib64/libc.so.6
>
> 10 Thread 0x7fb1a4c2c700 (LWP 8260) 0x00007fb1abf528ca in
> pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
> /lib64/libpthread.so.0
>
> 11 Thread 0x7fb196ffd700 (LWP 8263) 0x00007fb1abf528ca in
> pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
> /lib64/libpthread.so.0
>
> 12 Thread 0x7fb1a60d7700 (LWP 8247) 0x00007fb1ab822fe7 in
> epoll_wait () from /lib64/libc.so.6
>
> 13 Thread 0x7fb1a90ad700 (LWP 8149) 0x00007fb1abf528ca in
> pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
> /lib64/libpthread.so.0
>
> (gdb) print (call_frame_t*)myframe
>
> $1 = (call_frame_t *) 0x7fb188215740
>
> (gdb) print *(call_frame_t*)myframe
>
> $2 = {root = 0x7fb1a0085090, parent = 0xcd4642c4a3efd678, frames =
> {next = 0x151e2a92a5ae1bb, prev = 0x0}, *local = 0x0, this = 0x0,
> ret = 0x0*, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {
>
> __lock = 0, __count = 0, __owner = 0, __nusers = 4, __kind
> = 0, __spins = 0, __elision = 0, __list = {__prev =
> 0x7fb188215798, __next = 0x7fb188215798}},
>
> __size = '\000' <repeats 12 times>, "\004", '\000' <repeats
> 11 times>, "\230W!\210\261\177\000\000\230W!\210\261\177\000",
> __align = 0}}, cookie = 0x7fb1882157a8, complete = (unknown:
> 2283886504),
>
> op = 32689, begin = {tv_sec = 140400469825464, tv_usec =
> 140400469825464}, end = {tv_sec = 140400878737576, tv_usec =
> 140400132101048}, wind_from = 0x7fb18801cdc0 "", wind_to = 0x0,
> unwind_from = 0x0,
>
> unwind_to = 0x0}
>
> (gdb) thread 6
>
> [Switching to thread 6 (Thread 0x7fb17f7fe700 (LWP 8269))]
>
> #0 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2
> <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0 0x00007fb1abf5250c in pthread_cond_wait@@GLIBC_2.3.2
> <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
> #1 0x00007fb1ad1dc993 in __syncbarrier_wait
> (barrier=0x7fb188014790, waitfor=3) at syncop.c:1138
>
> #2 0x00007fb1ad1dc9e4 in syncbarrier_wait
> (barrier=0x7fb188014790, waitfor=3) at syncop.c:1155
>
> #3 0x00007fb1a6d59cde in afr_selfheal_unlocked_discover_on
> (*frame=0x7fb1882162d0*, inode=0x7fb188215740, gfid=0x7fb17f7fdb00
> "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
>
> replies=0x7fb17f7fcf40, discover_on=0x7fb1a0084cb0
> "\001\001\001", <incomplete sequence \360\255\272>) at
> afr-self-heal-common.c:1809
>
> #4 0x00007fb1a6d59d80 in afr_selfheal_unlocked_discover
> (*frame=0x7fb1882162d0*, inode=0x7fb188215740, gfid=0x7fb17f7fdb00
> "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
>
> replies=0x7fb17f7fcf40) at afr-self-heal-common.c:1828
>
> #5 0x00007fb1a6d5e51f in afr_selfheal_unlocked_inspect
> (frame=0x7fb1882162d0, this=0x7fb1a001db40, gfid=0x7fb17f7fdb00
> "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177",
>
> link_inode=0x7fb17f7fd9c8, data_selfheal=0x7fb17f7fd9c4,
> metadata_selfheal=0x7fb17f7fd9c0, entry_selfheal=0x7fb17f7fd9bc)
> at afr-self-heal-common.c:2241
>
> #6 0x00007fb1a6d5f19b in afr_selfheal_do (frame=0x7fb1882162d0,
> this=0x7fb1a001db40, gfid=0x7fb17f7fdb00
> "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177")
> at afr-self-heal-common.c:2483
>
> #7 0x00007fb1a6d5f346 in afr_selfheal (this=0x7fb1a001db40,
> gfid=0x7fb17f7fdb00
> "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177")
> at afr-self-heal-common.c:2543
>
> #8 0x00007fb1a6d6ac5c in afr_shd_selfheal (healer=0x7fb1a0085640,
> child=0, gfid=0x7fb17f7fdb00
> "x\326\357\243\304BFͻ\341Z*\251\342Q\001\060\333\177\177\261\177")
> at afr-self-heald.c:343
>
> #9 0x00007fb1a6d6b00b in afr_shd_index_heal
> (subvol=0x7fb1a00171e0, entry=0x7fb1a0714180,
> parent=0x7fb17f7fddc0, data=0x7fb1a0085640) at afr-self-heald.c:440
>
> #10 0x00007fb1ad201ed3 in syncop_mt_dir_scan
> (frame=0x7fb1a07a0e90, subvol=0x7fb1a00171e0, loc=0x7fb17f7fddc0,
> pid=-6, data=0x7fb1a0085640, fn=0x7fb1a6d6aebc
> <afr_shd_index_heal>, xdata=0x7fb1a07b4ed0,
>
> max_jobs=1, max_qlen=1024) at syncop-utils.c:407
>
> #11 0x00007fb1a6d6b2b5 in afr_shd_index_sweep
> (healer=0x7fb1a0085640, vgfid=0x7fb1a6d93610
> "glusterfs.xattrop_index_gfid") at afr-self-heald.c:494
>
> #12 0x00007fb1a6d6b394 in afr_shd_index_sweep_all
> (healer=0x7fb1a0085640) at afr-self-heald.c:517
>
> #13 0x00007fb1a6d6b697 in afr_shd_index_healer
> (data=0x7fb1a0085640) at afr-self-heald.c:597
>
> #14 0x00007fb1abf4c5da in start_thread () from /lib64/libpthread.so.0
>
> #15 0x00007fb1ab822cbf in clone () from /lib64/libc.so.6
>
> *From:*Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Thursday, October 11, 2018 3:36 PM
> *To:* Ravishankar N <ravishankar at redhat.com>
> <mailto:ravishankar at redhat.com>
> *Cc:* gluster-users <gluster-users at gluster.org>
> <mailto:gluster-users at gluster.org>
> *Subject:* glustershd coredump generated while reboot all 3 sn nodes
>
> Hi,
>
> I find that when restart sn node sometimes, the glustershd will
> exit and generate coredump. It has happened twice in my env, I
> would like to know your opinion on this issue, thanks!
>
> The glusterfs version I use is glusterfs3.12.3
>
> [root at sn-1:/root]
>
> # gluster v info log
>
> Volume Name: log
>
> Type: Replicate
>
> Volume ID: 87bcbaf8-5fa4-4060-9149-23f832befe92
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sn-0.local:/mnt/bricks/log/brick
>
> Brick2: sn-1.local:/mnt/bricks/log/brick
>
> Brick3: sn-2.local:/mnt/bricks/log/brick
>
> Options Reconfigured:
>
> server.allow-insecure: on
>
> cluster.quorum-type: auto
>
> network.ping-timeout: 42
>
> cluster.consistent-metadata: on
>
> cluster.favorite-child-policy: mtime
>
> cluster.quorum-reads: no
>
> cluster.server-quorum-type: none
>
> transport.address-family: inet
>
> nfs.disable: on
>
> performance.client-io-threads: off
>
> cluster.server-quorum-ratio: 51%
>
> [root at sn-1:/root]
>
> ///////////////////////////////////////////////glustershd
> coredump////////////////////////////////////////////////////////////////
>
> # lz4 -d
> core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000.lz4
>
> Decoding file
> core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
>
>
> core.glusterfs.0.c5f : decoded 263188480 bytes
>
> [root at sn-0:/mnt/export]
>
> # gdb /usr/sbin/glusterfs
> core.glusterfs.0.c5f0c5547fbd4e5aa8f350b748e5675e.1812.1537967075000000
>
> GNU gdb (GDB) Fedora 8.1-14.wf29
>
> Copyright (C) 2018 Free Software Foundation, Inc.
>
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
>
> This is free software: you are free to change and redistribute it.
>
> There is NO WARRANTY, to the extent permitted by law. Type "show
> copying"
>
> and "show warranty" for details.
>
> This GDB was configured as "x86_64-redhat-linux-gnu".
>
> Type "show configuration" for configuration details.
>
> For bug reporting instructions, please see:
>
> <http://www.gnu.org/software/gdb/bugs/>.
>
> Find the GDB manual and other documentation resources online at:
>
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
>
> Type "apropos word" to search for commands related to "word"...
>
> Reading symbols from /usr/sbin/glusterfs...(no debugging symbols
> found)...done.
>
> warning: core file may not match specified executable file.
>
> [New LWP 1818]
>
> [New LWP 1812]
>
> [New LWP 1813]
>
> [New LWP 1817]
>
> [New LWP 1966]
>
> [New LWP 1968]
>
> [New LWP 1970]
>
> [New LWP 1974]
>
> [New LWP 1976]
>
> [New LWP 1814]
>
> [New LWP 1815]
>
> [New LWP 1816]
>
> [New LWP 1828]
>
> [Thread debugging using libthread_db enabled]
>
> Using host libthread_db library "/lib64/libthread_db.so.1".
>
> Core was generated by `/usr/sbin/glusterfs -s sn-0.local
> --volfile-id gluster/glustershd -p /var/run/g'.
>
> Program terminated with signal SIGSEGV, Segmentation fault.
>
> #0 0x00007f1b5e5d7d24 in client3_3_lookup_cbk
> (req=0x7f1b44002300, iov=0x7f1b44002340, count=1,
> myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
>
> 2802 client-rpc-fops.c: No such file or directory.
>
> [Current thread is 1 (Thread 0x7f1b5f00c700 (LWP 1818))]
>
> Missing separate debuginfos, use: dnf debuginfo-install
> rcp-pack-glusterfs-1.2.0_1_g54e6196-RCP2.wf29.x86_64
>
> (gdb) bt
>
> #0 0x00007f1b5e5d7d24 in client3_3_lookup_cbk
> (req=0x7f1b44002300, iov=0x7f1b44002340, count=1,
> myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
>
> #1 0x00007f1b64553d47 in rpc_clnt_handle_reply
> (clnt=0x7f1b5808bbb0, pollin=0x7f1b580c6620) at rpc-clnt.c:778
>
> #2 0x00007f1b645542e5 in rpc_clnt_notify (trans=0x7f1b5808bde0,
> mydata=0x7f1b5808bbe0, event=RPC_TRANSPORT_MSG_RECEIVED,
> data=0x7f1b580c6620) at rpc-clnt.c:971
>
> #3 0x00007f1b64550319 in rpc_transport_notify
> (this=0x7f1b5808bde0, event=RPC_TRANSPORT_MSG_RECEIVED,
> data=0x7f1b580c6620) at rpc-transport.c:538
>
> #4 0x00007f1b5f49734d in socket_event_poll_in
> (this=0x7f1b5808bde0, notify_handled=_gf_true) at socket.c:2315
>
> #5 0x00007f1b5f497992 in socket_event_handler (fd=25, idx=15,
> gen=7, data=0x7f1b5808bde0, poll_in=1, poll_out=0, poll_err=0) at
> socket.c:2471
>
> #6 0x00007f1b647fe5ac in event_dispatch_epoll_handler
> (event_pool=0x230cb00, event=0x7f1b5f00be84) at event-epoll.c:583
>
> #7 0x00007f1b647fe883 in event_dispatch_epoll_worker
> (data=0x23543d0) at event-epoll.c:659
>
> #8 0x00007f1b6354a5da in start_thread () from /lib64/libpthread.so.0
>
> #9 0x00007f1b62e20cbf in clone () from /lib64/libc.so.6
>
> *(gdb) print *(call_frame_t*)myframe*
>
> *$1 = {root = 0x100000000, parent = 0x100000005, frames = {next =
> 0x7f1b4401c8a8, prev = 0x7f1b44010190}, **local = 0x0**, this =
> 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex =
> {__data = {*
>
> *__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
> __spins = 0, __elision = 0, __list = {__prev = 0x7f1b44010190,
> __next = 0x0}}, *
>
> * __size = '\000' <repeats 24 times>,
> "\220\001\001D\033\177\000\000\000\000\000\000\000\000\000",
> __align = 0}}, cookie = 0x7f1b4401ccf0, complete = _gf_false, op =
> GF_FOP_NULL, begin = {*
>
> *tv_sec = 139755081730912, tv_usec = 139755081785872}, end =
> {tv_sec = 448811404, tv_usec = 21474836481}, wind_from = 0x0,
> wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0}*
>
> (gdb) info thread
>
> Id Target Id Frame
>
> * 1 Thread 0x7f1b5f00c700 (LWP 1818) 0x00007f1b5e5d7d24 in
> client3_3_lookup_cbk (req=0x7f1b44002300, iov=0x7f1b44002340,
> count=1, myframe=0x7f1b4401c850) at client-rpc-fops.c:2802
>
> 2 Thread 0x7f1b64c83780 (LWP 1812) 0x00007f1b6354ba3d in
> __pthread_timedjoin_ex () from /lib64/libpthread.so.0
>
> 3 Thread 0x7f1b61eae700 (LWP 1813) 0x00007f1b63554300 in
> nanosleep () from /lib64/libpthread.so.0
>
> 4 Thread 0x7f1b5feaa700 (LWP 1817) 0x00007f1b635508ca in
> pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
> /lib64/libpthread.so.0
>
> 5 Thread 0x7f1b5ca2b700 (LWP 1966) 0x00007f1b62dee4b0 in
> nanosleep () from /lib64/libc.so.6
>
> 6 Thread 0x7f1b4f7fe700 (LWP 1968) 0x00007f1b6355050c in
> pthread_cond_wait@@GLIBC_2.3.2
> <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
> 7 Thread 0x7f1b4e7fc700 (LWP 1970) 0x00007f1b6355050c in
> pthread_cond_wait@@GLIBC_2.3.2
> <mailto:pthread_cond_wait@@GLIBC_2.3.2> () from /lib64/libpthread.so.0
>
> 8 Thread 0x7f1b4d7fa700 (LWP 1974) 0x00007f1b62dee4b0 in
> nanosleep () from /lib64/libc.so.6
>
> 9 Thread 0x7f1b33fff700 (LWP 1976) 0x00007f1b62dee4b0 in
> nanosleep () from /lib64/libc.so.6
>
> 10 Thread 0x7f1b616ad700 (LWP 1814) 0x00007f1b62d5fcbc in
> sigtimedwait () from /lib64/libc.so.6
>
> 11 Thread 0x7f1b60eac700 (LWP 1815) 0x00007f1b62dee4b0 in
> nanosleep () from /lib64/libc.so.6
>
> 12 Thread 0x7f1b606ab700 (LWP 1816) 0x00007f1b635508ca in
> pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> () from
> /lib64/libpthread.so.0
>
> 13 Thread 0x7f1b5d6d5700 (LWP 1828) 0x00007f1b62e20fe7 in
> epoll_wait () from /lib64/libc.so.6
>
> (gdb) quit
>
> The source code is like this, so from gdb it coredump because
> frame->local is *NULL*!!
>
> From sn-0 journal log,
>
> Sep 26 16:04:40.034577 sn-0 systemd-coredump[2612]: Process 1812
> (glusterfs) of user 0 dumped core.
>
> Stack trace of
> thread 1818:
>
> #0 0x00007f1b5e5d7d24 client3_3_lookup_cbk (client.so)
>
> #1 0x00007f1b64553d47 rpc_clnt_handle_reply (libgfrpc.so.0)
>
> #2 0x00007f1b645542e5 rpc_clnt_notify (libgfrpc.so.0)
>
> #3 0x00007f1b64550319 rpc_transport_notify (libgfrpc.so.0)
>
> #4 0x00007f1b5f49734d socket_event_poll_in (socket.so)
>
> #5 0x00007f1b5f497992
> socket_event_handler (socket.so)
>
> #6 0x00007f1b647fe5ac event_dispatch_epoll_handler
> (libglusterfs.so.0)
>
> #7 0x00007f1b647fe883 event_dispatch_epoll_worker
> (libglusterfs.so.0)
>
> #8 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #9 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1812:
>
> #0 0x00007f1b6354ba3d __GI___pthread_timedjoin_ex (libpthread.so.0)
>
> #1 0x00007f1b647feae1 event_dispatch_epoll (libglusterfs.so.0)
>
> #2 0x00007f1b647c2703
> event_dispatch (libglusterfs.so.0)
>
> #3 0x000000000040ab95 main (glusterfsd)
>
> #4 0x00007f1b62d4baf7 __libc_start_main (libc.so.6)
>
> #5 0x000000000040543a _start (glusterfsd)
>
> Stack trace of
> thread 1813:
>
> #0 0x00007f1b63554300 __nanosleep (libpthread.so.0)
>
> #1 0x00007f1b647a04e5 gf_timer_proc (libglusterfs.so.0)
>
> #2 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #3 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1817:
>
> #0 0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0)
>
> #1 0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
>
> #2 0x00007f1b647d9b7e syncenv_processor (libglusterfs.so.0)
>
> #3 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #4 0x00007f1b62e20cbf __clone
> (libc.so.6)
>
> Stack trace of
> thread 1966:
>
> #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
> #1 0x00007f1b62dee38a sleep (libc.so.6)
>
> #2 0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
>
> #3 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #4 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1968:
>
> #0 0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2
> <mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0)
>
> #1 0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
>
> #2 0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
>
> #3 0x00007f1b5e357cde
> afr_selfheal_unlocked_discover_on (replicate.so)
>
> #4 0x00007f1b5e357d80 afr_selfheal_unlocked_discover (replicate.so)
>
> #5 0x00007f1b5e363bf8 __afr_selfheal_entry_prepare
> (replicate.so)
>
> #6 0x00007f1b5e3641c0 afr_selfheal_entry_dirent (replicate.so)
>
> #7 0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
>
> #8 0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
>
> #9 0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
>
> #10 0x00007f1b5e365bba afr_selfheal_entry (replicate.so)
>
> #11 0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
>
> #12 0x00007f1b5e35d346
> afr_selfheal (replicate.so)
>
> #13 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
>
> #14 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
>
> #15 0x00007f1b647ffed3 syncop_mt_dir_scan (libglusterfs.so.0)
>
> #16 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
>
> #17 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
>
> #18 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
>
> #19 0x00007f1b6354a5da
> start_thread (libpthread.so.0)
>
> #20 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1970:
>
> #0 0x00007f1b6355050c pthread_cond_wait@@GLIBC_2.3.2
> <mailto:pthread_cond_wait@@GLIBC_2.3.2> (libpthread.so.0)
>
> #1 0x00007f1b647da993 __syncbarrier_wait (libglusterfs.so.0)
>
> #2 0x00007f1b647da9e4 syncbarrier_wait (libglusterfs.so.0)
>
> #3 0x00007f1b5e357742 afr_selfheal_unlocked_lookup_on (replicate.so)
>
> #4 0x00007f1b5e364204 afr_selfheal_entry_dirent (replicate.so)
>
> #5 0x00007f1b5e36488a afr_selfheal_entry_do_subvol (replicate.so)
>
> #6 0x00007f1b5e365077 afr_selfheal_entry_do (replicate.so)
>
> #7 0x00007f1b5e3656b6 __afr_selfheal_entry (replicate.so)
>
> #8 0x00007f1b5e365bba afr_selfheal_entry
> (replicate.so)
>
> #9 0x00007f1b5e35d250 afr_selfheal_do (replicate.so)
>
> #10 0x00007f1b5e35d346 afr_selfheal (replicate.so)
>
> #11 0x00007f1b5e368c5c afr_shd_selfheal (replicate.so)
>
> #12 0x00007f1b5e36900b afr_shd_index_heal (replicate.so)
>
> #13 0x00007f1b647ffed3
> syncop_mt_dir_scan (libglusterfs.so.0)
>
> #14 0x00007f1b5e3692b5 afr_shd_index_sweep (replicate.so)
>
> #15 0x00007f1b5e369394 afr_shd_index_sweep_all (replicate.so)
>
> #16 0x00007f1b5e369697 afr_shd_index_healer (replicate.so)
>
> #17 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #18 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1974:
>
> #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
> #1 0x00007f1b62dee38a sleep (libc.so.6)
>
> #2 0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
>
> #3 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #4 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1976:
>
> #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
> #1 0x00007f1b62dee38a sleep (libc.so.6)
>
> #2 0x00007f1b5e36970c afr_shd_index_healer (replicate.so)
>
> #3 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #4 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1814:
>
> #0 0x00007f1b62d5fcbc __sigtimedwait (libc.so.6)
>
> #1 0x00007f1b63554afc sigwait (libpthread.so.0)
>
> #2 0x0000000000409ed7 glusterfs_sigwaiter (glusterfsd)
>
> #3 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #4 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1815:
>
> #0 0x00007f1b62dee4b0 __nanosleep (libc.so.6)
>
> #1 0x00007f1b62dee38a sleep (libc.so.6)
>
> #2 0x00007f1b647c3f5c pool_sweeper (libglusterfs.so.0)
>
> #3 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #4 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1816:
>
> #0
> 0x00007f1b635508ca pthread_cond_timedwait@@GLIBC_2.3.2
> <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> (libpthread.so.0)
>
> #1 0x00007f1b647d98e3 syncenv_task (libglusterfs.so.0)
>
> #2 0x00007f1b647d9b7e
> syncenv_processor (libglusterfs.so.0)
>
> #3 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #4 0x00007f1b62e20cbf __clone (libc.so.6)
>
> Stack trace of
> thread 1828:
>
> #0 0x00007f1b62e20fe7 epoll_wait (libc.so.6)
>
> #1 0x00007f1b647fe855 event_dispatch_epoll_worker (libglusterfs.so.0)
>
> #2 0x00007f1b6354a5da start_thread (libpthread.so.0)
>
> #3 0x00007f1b62e20cbf
> __clone (libc.so.6)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/ace69897/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 14795 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/ace69897/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 58866 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181023/ace69897/attachment-0001.jpg>
More information about the Gluster-users
mailing list