<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 8, 2019 at 7:42 AM Zhou, Cynthia (NSB - CN/Hangzhou) <<a href="mailto:cynthia.zhou@nokia-sbell.com">cynthia.zhou@nokia-sbell.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="ZH-CN">
<div class="gmail-m_4854284690638810352WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi glusterfs experts,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Good day!<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">In my test env, sometimes glusterd stuck issue happened, and it is not responding to any gluster commands, when I checked this issue I find that glusterd thread 9 and thread 8 is dealing with the same socket, I thought
following patch should be able to solve this issue, however after I merged this patch this issue still exist. When I looked into this code, it seems socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, I think this gives the chance
for another poll for the exactly the same socket. And caused this glusterd stuck issue, also, I find there is no LOCK_DESTROY(&iobref->lock)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">In iobref_destroy, I think it is better to add destroy lock.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Following is the gdb info when this issue happened, I would like to know your opinion on this issue, thanks!<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-align:left" align="left"><span style="font-size:8pt;font-family:"Courier New"" lang="EN-US">SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537<u></u><u></u></span></p>
<p class="MsoNormal" style="text-align:left" align="left"><span style="font-size:8pt;font-family:"Courier New"" lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span style="font-size:8pt;font-family:"Courier New"" lang="EN-US">* socket: fix issue on concurrent handle of a socket<u></u><u></u></span></b></p>
<p class="MsoNormal"><b><span style="font-size:8pt;font-family:"Courier New"" lang="EN-US"><u></u> <u></u></span></b></p>
<p class="MsoNormal"><b><span style="font-size:8pt;font-family:"Courier New"" lang="EN-US"><u></u> <u></u></span></b></p>
<p class="MsoNormal"><b><span style="font-size:8pt;font-family:"Courier New"" lang="EN-US"><u></u> <u></u></span></b></p>
<p class="MsoNormal"><b><span style="font-size:8pt;font-family:"Courier New"" lang="EN-US">GDB INFO:</span></b><span lang="EN-US"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in iobref_unref, I think
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#1 0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#3 0x00007f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) at rpc-transport.c:123<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#4 0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, notify_handled=_gf_true) at socket.c:2322<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#5 0x00007f9ee4fbf932 in socket_event_handler (<b><span style="color:red">fd=36</span>,<span style="color:red"> idx=27, gen=4, data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0</span></b>) at socket.c:2471<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#6 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#7 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at event-epoll.c:659<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#8 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#9 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Thread 8 (Thread 0x7f9edffff700 (LWP 1932)):<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#1 0x00007f9ee9fd2b42 in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#2 0x00007f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#3 0x00007f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, idx=27) at socket.c:1201<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#4 0x00007f9ee4fbf99c in socket_event_handler (<b><span style="color:red">fd=36, idx=27, gen=4, data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0</span></b>) at socket.c:2480<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#5 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#6 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at event-epoll.c:659<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#7 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#8 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">(gdb) thread 9<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">(gdb) bt<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#1 0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#3 0x00007f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) at rpc-transport.c:123<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#4 0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, notify_handled=_gf_true) at socket.c:2322<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#5 0x00007f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#6 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#7 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at event-epoll.c:659<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#8 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#9 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">(gdb) frame 2<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">#2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">944 iobuf.c: No such file or directory.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">(gdb) print *iobref<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">$1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222, __owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> __elision = 0, __list = {__prev = 0x4000, __next = 0x7f9ed00063b000}},
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> __size = "\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177", __align = 953482739714}}, ref = -256, iobrefs = 0xffffffffffffffff,
alloced = -1, used = -1}</span></p></div></div></blockquote><div><br></div><div>looks like the iobref is corrupted here. It seems to be a use-after-free issue. We need to dig into why a freed iobref is being accessed here.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="ZH-CN"><div class="gmail-m_4854284690638810352WordSection1"><p class="MsoNormal"><span lang="EN-US"><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">(gdb) quit<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">A debugging session is active.<u></u><u></u></span></p>
</div>
</div>
</blockquote></div></div>