[Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool

Vijaikumar M vmallika at redhat.com
Tue Jun 24 11:45:25 UTC 2014


Hi Jeff,

This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: 
edge triggered and multi-threaded epoll).
The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please 
find the stack trace below).

In the code snippet below we found that 'SSL_pending' was returning 0.
I have added a condition here to return from the function when there is 
no data available.
Please suggest if this is OK to do this way or do we need to restructure 
this function for multi-threaded epoll?

<code: socket.c>
  178 static int
  179 ssl_do (rpc_transport_t *this, void *buf, size_t len, 
SSL_trinary_func *func)
  180 {
  ....

  211                 switch (SSL_get_error(priv->ssl_ssl,r)) {
  212                 case SSL_ERROR_NONE:
  213                         return r;
  214                 case SSL_ERROR_WANT_READ:
  215                         if (SSL_pending(priv->ssl_ssl) == 0)
  216                                 return r;
  217                         pfd.fd = priv->sock;
  221                         if (poll(&pfd,1,-1) < 0) {
</code>



Thanks,
Vijay

On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
> From the stack trace we found that function 'socket_submit_request' is 
> waiting on mutext_lock.
> lock is held by the function 'ssl_do' and this function is blocked by 
> poll syscall.
>
>
> (gdb) bt
> #0  0x0000003daa80822d in pthread_join () from /lib64/libpthread.so.0
> #1  0x00007f3b94eea9d0 in event_dispatch_epoll (event_pool=<value 
> optimized out>) at event-epoll.c:632
> #2  0x0000000000407ecd in main (argc=4, argv=0x7fff160a4528) at 
> glusterfsd.c:2023
>
>
> (gdb) info threads
>   10 Thread 0x7f3b8d483700 (LWP 26225) 0x0000003daa80e264 in 
> __lll_lock_wait () from /lib64/libpthread.so.0
>   9 Thread 0x7f3b8ca82700 (LWP 26226)  0x0000003daa80f4b5 in sigwait 
> () from /lib64/libpthread.so.0
>   8 Thread 0x7f3b8c081700 (LWP 26227)  0x0000003daa80b98e in 
> pthread_cond_timedwait@@GLIBC_2.3.2 ()
>    from /lib64/libpthread.so.0
>   7 Thread 0x7f3b8b680700 (LWP 26228)  0x0000003daa80b98e in 
> pthread_cond_timedwait@@GLIBC_2.3.2 ()
>    from /lib64/libpthread.so.0
>   6 Thread 0x7f3b8a854700 (LWP 26232)  0x0000003daa4e9163 in 
> epoll_wait () from /lib64/libc.so.6
>   5 Thread 0x7f3b89e53700 (LWP 26233)  0x0000003daa4e9163 in 
> epoll_wait () from /lib64/libc.so.6
>   4 Thread 0x7f3b833eb700 (LWP 26241)  0x0000003daa4df343 in poll () 
> from /lib64/libc.so.6
>   3 Thread 0x7f3b82130700 (LWP 26245)  0x0000003daa80e264 in 
> __lll_lock_wait () from /lib64/libpthread.so.0
>   2 Thread 0x7f3b8172f700 (LWP 26247)  0x0000003daa80e75d in read () 
> from /lib64/libpthread.so.0
> * 1 Thread 0x7f3b94a38700 (LWP 26224)  0x0000003daa80822d in 
> pthread_join () from /lib64/libpthread.so.0
>
>
> *(gdb) thread 3**
> **[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0  
> 0x0000003daa80e264 in __lll_lock_wait ()**
> **   from /lib64/libpthread.so.0**
> **(gdb) bt
> #0  0x0000003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x0000003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x0000003daa8093d7 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0
> #3  0x00007f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, 
> req=0x7f3b8212f0b0) at socket.c:3134
> *#4  0x00007f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, 
> prog=<value optimized out>,
>     procnum=<value optimized out>, cbkfn=0x7f3b892364b0 
> <client3_3_lookup_cbk>, proghdr=0x7f3b8212f410,
>     proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=<value 
> optimized out>, frame=0x7f3b93d2a454,
>     rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
> rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)
>     at rpc-clnt.c:1556
> #5  0x00007f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, 
> req=<value optimized out>,
>     frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, 
> cbkfn=0x7f3b892364b0 <client3_3_lookup_cbk>, iobref=0x0,
>     rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
> rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,
>     xdrproc=0x7f3b94a4ede0 <xdr_gfs3_lookup_req>) at client.c:243
> #6  0x00007f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, 
> this=0x7f3b7c005ef0, data=0x7f3b8212f660)
>     at client-rpc-fops.c:3119
>
>
> (gdb) p priv->lock
> $1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 
> 1, __kind = 0, __spins = 0, __list = {
>       __prev = 0x0, __next = 0x0}},
>   __size = "\002\000\000\000\000\000\000\000\201f\000\000\001", '\000' 
> <repeats 26 times>, __align = 2}
>
>
> *(gdb) thread 4
> [Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0  
> 0x0000003daa4df343 in poll () from /lib64/libc.so.6
> (gdb) bt
> #0  0x0000003daa4df343 in poll () from /lib64/libc.so.6
> #1  0x00007f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, 
> buf=0x7f3b7c051264, len=4, func=0x3db2441570 <SSL_read>)
>     at socket.c:216
> #2  0x00007f3b8aa7277b in __socket_ssl_readv (this=<value optimized 
> out>, opvector=<value optimized out>,
>     opcount=<value optimized out>) at socket.c:335
> #3  0x00007f3b8aa72c26 in __socket_cached_read (this=<value optimized 
> out>, vector=<value optimized out>,
>     count=<value optimized out>, pending_vector=0x7f3b7c051258, 
> pending_count=0x7f3b7c051260, bytes=0x0, write=0)
>     at socket.c:422
> #4  __socket_rwv (this=<value optimized out>, vector=<value optimized 
> out>, count=<value optimized out>,
>     pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, 
> bytes=0x0, write=0) at socket.c:496
> #5  0x00007f3b8aa76040 in __socket_readv (this=0x7f3b7c0505c0) at 
> socket.c:589
> #6  __socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:1966
> #7  socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:2106
> #8  socket_event_poll_in (this=0x7f3b7c0505c0) at socket.c:2127
> #9  0x00007f3b8aa77820 in socket_poller (ctx=0x7f3b7c0505c0) at 
> socket.c:2338
> #10 0x0000003daa8079d1 in start_thread () from /lib64/libpthread.so.0
> #11 0x0000003daa4e8b6d in clone () from /lib64/libc.so.6
> *
>
> Thanks,
> Vijay
>
>
> On Tuesday 24 June 2014 08:59 AM, Raghavendra Gowdappa wrote:
>> ok. Sorry, I didn't look into change #. I'll sync up with Vijay.
>>
>> ----- Original Message -----
>>> From: "Anand Avati"<avati at redhat.com>
>>> To: "Raghavendra Gowdappa"<rgowdapp at redhat.com>
>>> Cc:vmallika at redhat.com
>>> Sent: Tuesday, June 24, 2014 8:55:34 AM
>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
>>>
>>> On 6/23/14, 8:00 PM, Raghavendra Gowdappa wrote:
>>>> ----- Original Message -----
>>>>> From: "Raghavendra Gowdappa"<rgowdapp at redhat.com>
>>>>> To: "Anand Avati"<avati at redhat.com>
>>>>> Cc:vmallika at redhat.com
>>>>> Sent: Tuesday, June 24, 2014 8:28:41 AM
>>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
>>>>> FDs in a separate event pool
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Anand Avati"<avati at redhat.com>
>>>>>> To:vmallika at redhat.com
>>>>>> Cc: "Raghavendra G"<rgowdapp at redhat.com>
>>>>>> Sent: Monday, June 23, 2014 10:07:19 PM
>>>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
>>>>>> FDs in a separate event pool
>>>>>>
>>>>>> On 6/22/14, 8:47 PM, Vijaikumar Mallikarjuna (Code Review) wrote:
>>>>>>> Vijaikumar Mallikarjuna has posted comments on this change.
>>>>>>>
>>>>>>> Change subject: epoll: Handle client and server FDs in a separate event
>>>>>>> pool
>>>>>>> ......................................................................
>>>>>>>
>>>>>>>
>>>>>>> Patch Set 9:
>>>>>>>
>>>>>>> Hi Avati,
>>>>>>>
>>>>>>> Actually we started working on the fix for Bug# 1096729 which was a
>>>>>>> blocker
>>>>>>> issue.
>>>>>>> We tried multiple ways not to change the current epoll model for now,
>>>>>>> however we had to do some changes in the epoll code and ended with this
>>>>>>> patch.
>>>>>>>
>>>>>>>
>>>>>>> MT patch# 3842 looks good to me. It will be great you can help us
>>>>>>> getting
>>>>>>> the patch in quickly.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vijay
>>>>>>>
>>>>>> Copying Raghavendra as he's the RPC guy. Du - #3842 is blocked in review
>>>>>> for a long time because of some incompatibility with RPC SSL mode. Very
>>>>>> likely some issue in our SSL multi-threading code. Can you help Vijai
>>>>>> debug this and move #3842 forward? Also there are new SSL patches from
>>>>>> Jeff upstream. Can you guys check if the new patches fix this problem?
>>>>> Sure, I'll try to sync up with Vijay.
>>>> However, I've a doubt on the approach we've to take. Doesn't your patch on
>>>> multithreaded epoll also fix this issue? Given that yours is a generic
>>>> solution, shouldn't it be favoured over this solution?
>>>>
>>>>>>
>>> that's precisely what i meant.. #3824 (the more generic MT epoll) is
>>> having some issues with SSL MT code (otherwise it is working fine)
>>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140624/af315f44/attachment.html>


More information about the Gluster-devel mailing list