[Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

Sanju Rakonde srakonde at redhat.com
Tue Apr 9 07:08:16 UTC 2019


Hello,

I'm unable to figure out the issue by just looking at the backtrace. You
might be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1650115. If
you come across the same problem in future, please capture pstack output
and share it with us.

On Mon, Apr 8, 2019 at 2:31 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.zhou at nokia-sbell.com> wrote:

> Hi,
>
> The env is not there anymore, but I have collected the thread stack trace
> of glusterd with command “thread apply all bt”
>
> Using host libthread_db library "/lib64/libthread_db.so.1".
>
> 0x00007f9ee9fcfa3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
>
> Missing separate debuginfos, use: dnf debuginfo-install
> rcp-pack-glusterfs-1.12.0_1_gc999db1-RCP2.wf29.x86_64
>
> (gdb) thread apply all bt
>
>
>
> Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
>
> #0  0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x00007f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x00007f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #6  0x00007f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 8 (Thread 0x7f9edffff700 (LWP 1932)):
>
> #0  0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x00007f9ee9fd2b42 in __pthread_mutex_cond_lock () from
> /lib64/libpthread.so.0
>
> #2  0x00007f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #3  0x00007f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0,
> gen=4, idx=27) at socket.c:1201
>
> #4  0x00007f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
>
> #5  0x00007f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
>
> #6  0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at
> event-epoll.c:659
>
> #7  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #8  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 7 (Thread 0x7f9ee49b3700 (LWP 1931)):
>
> #0  0x00007f9ee9fd45bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #1  0x00007f9ee5e651b9 in hooks_worker (args=0x1813000) at
> glusterd-hooks.c:529
>
> #2  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #3  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 6 (Thread 0x7f9ee692e700 (LWP 1762)):
>
> #0  0x00007f9ee9fd497a in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #1  0x00007f9eeb25d904 in syncenv_task (proc=0x1808e00) at syncop.c:603
>
> #2  0x00007f9eeb25db9f in syncenv_processor (thdata=0x1808e00) at
> syncop.c:695
>
> #3  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #4  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 5 (Thread 0x7f9ee712f700 (LWP 1761)):
>
> ---Type <return> to continue, or q <return> to quit---
>
> #0  0x00007f9ee9fd497a in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #1  0x00007f9eeb25d904 in syncenv_task (proc=0x1808a40) at syncop.c:603
>
> #2  0x00007f9eeb25db9f in syncenv_processor (thdata=0x1808a40) at
> syncop.c:695
>
> #3  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #4  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 4 (Thread 0x7f9ee7930700 (LWP 1760)):
>
> #0  0x00007f9ee98725d0 in nanosleep () from /lib64/libc.so.6
>
> #1  0x00007f9ee98724aa in sleep () from /lib64/libc.so.6
>
> #2  0x00007f9eeb247fdf in pool_sweeper (arg=0x0) at mem-pool.c:481
>
> #3  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #4  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 3 (Thread 0x7f9ee8131700 (LWP 1759)):
>
> #0  0x00007f9ee97e3d7c in sigtimedwait () from /lib64/libc.so.6
>
> #1  0x00007f9ee9fd8bac in sigwait () from /lib64/libpthread.so.0
>
> #2  0x0000000000409ed7 in glusterfs_sigwaiter ()
>
> #3  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #4  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 2 (Thread 0x7f9ee8932700 (LWP 1758)):
>
> #0  0x00007f9ee9fd83b0 in nanosleep () from /lib64/libpthread.so.0
>
> #1  0x00007f9eeb224545 in gf_timer_proc (data=0x1808580) at timer.c:164
>
> #2  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #3  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 1 (Thread 0x7f9eeb707780 (LWP 1757)):
>
> #0  0x00007f9ee9fcfa3d in __pthread_timedjoin_ex () from
> /lib64/libpthread.so.0
>
> #1  0x00007f9eeb282b09 in event_dispatch_epoll (event_pool=0x17feb00) at
> event-epoll.c:746
>
> #2  0x00007f9eeb246786 in event_dispatch (event_pool=0x17feb00) at
> event.c:124
>
> #3  0x000000000040ab95 in main ()
>
> (gdb)
>
> (gdb)
>
> (gdb) q!
>
> A syntax error in expression, near `'.
>
> (gdb) quit
>
>
>
> *From:* Sanju Rakonde <srakonde at redhat.com>
> *Sent:* Monday, April 08, 2019 4:58 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Raghavendra Gowdappa <rgowdapp at redhat.com>;
> gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] glusterd stuck for glusterfs with version
> 3.12.15
>
>
>
> Can you please capture output of "pstack $(pidof glusterd)" and send it to
> us? We need to capture this information when glusterd is struck.
>
>
>
> On Mon, Apr 8, 2019 at 8:05 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com> wrote:
>
> Hi glusterfs experts,
>
> Good day!
>
> In my test env, sometimes glusterd stuck issue happened, and it is not
> responding to any gluster commands, when I checked this issue I find that
> glusterd thread 9 and thread 8 is dealing with the same socket, I thought
> following patch should be able to solve this issue, however after I merged
> this patch this issue still exist. When I looked into this code, it seems
> socket_event_poll_in called event_handled before
> rpc_transport_pollin_destroy, I think this gives the chance for another
> poll for the exactly the same socket. And caused this glusterd stuck issue,
> also, I find there is no   LOCK_DESTROY(&iobref->lock)
>
> In iobref_destroy, I think it is better to add destroy lock.
>
> Following is the gdb info when this issue happened, I would like to know
> your opinion on this issue, thanks!
>
>
>
> SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
>
>
>
>
> *GDB INFO:*
>
> Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in
> iobref_unref, I think
>
> Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
>
> #0  0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x00007f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x00007f9ee4fbf932 in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2471
>
> #6  0x00007f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 8 (Thread 0x7f9edffff700 (LWP 1932)):
>
> #0  0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x00007f9ee9fd2b42 in __pthread_mutex_cond_lock () from
> /lib64/libpthread.so.0
>
> #2  0x00007f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #3  0x00007f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0,
> gen=4, idx=27) at socket.c:1201
>
> #4  0x00007f9ee4fbf99c in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2480
>
> #5  0x00007f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
>
> #6  0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at
> event-epoll.c:659
>
> #7  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #8  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> (gdb) thread 9
>
> [Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
>
> #0  0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0  0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x00007f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x00007f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #6  0x00007f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
>
> (gdb) frame 2
>
> #2  0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> 944 iobuf.c: No such file or directory.
>
> (gdb) print *iobref
>
> $1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222,
> __owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
>
>         __elision = 0, __list = {__prev = 0x4000, __next =
> 0x7f9ed00063b000}},
>
>       __size =
> "\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@
> \000\000\000\000\000\000\000\260c\000О\177", __align = 953482739714}},
> ref = -256, iobrefs = 0xffffffffffffffff, alloced = -1, used = -1}
>
> (gdb) quit
>
> A debugging session is active.
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> --
>
> Thanks,
>
> Sanju
>


-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190409/3574349c/attachment-0001.html>


More information about the Gluster-devel mailing list