[Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool

Vijaikumar M vmallika at redhat.com
Tue Jun 24 12:04:09 UTC 2014


Hi Jeff,

Missed to add this:
SSL_pending was 0 before calling SSL_readand hence SSL_get_errorreturned 
'SSL_ERROR_WANT_READ'

Thanks,
Vijay


On Tuesday 24 June 2014 05:15 PM, Vijaikumar M wrote:
> Hi Jeff,
>
> This is regarding the patch http://review.gluster.org/#/c/3842/ 
> (epoll: edge triggered and multi-threaded epoll).
> The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please 
> find the stack trace below).
>
> In the code snippet below we found that 'SSL_pending' was returning 0.
> I have added a condition here to return from the function when there 
> is no data available.
> Please suggest if this is OK to do this way or do we need to 
> restructure this function for multi-threaded epoll?
>
> <code: socket.c>
>  178 static int
>  179 ssl_do (rpc_transport_t *this, void *buf, size_t len, 
> SSL_trinary_func *func)
>  180 {
>  ....
>
>  211                 switch (SSL_get_error(priv->ssl_ssl,r)) {
>  212                 case SSL_ERROR_NONE:
>  213                         return r;
>  214                 case SSL_ERROR_WANT_READ:
>  215                         if (SSL_pending(priv->ssl_ssl) == 0)
>  216                                 return r;
>  217                         pfd.fd = priv->sock;
>  221                         if (poll(&pfd,1,-1) < 0) {
> </code>
>
>
>
> Thanks,
> Vijay
>
> On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
>> From the stack trace we found that function 'socket_submit_request' 
>> is waiting on mutext_lock.
>> lock is held by the function 'ssl_do' and this function is blocked by 
>> poll syscall.
>>
>>
>> (gdb) bt
>> #0  0x0000003daa80822d in pthread_join () from /lib64/libpthread.so.0
>> #1  0x00007f3b94eea9d0 in event_dispatch_epoll (event_pool=<value 
>> optimized out>) at event-epoll.c:632
>> #2  0x0000000000407ecd in main (argc=4, argv=0x7fff160a4528) at 
>> glusterfsd.c:2023
>>
>>
>> (gdb) info threads
>>   10 Thread 0x7f3b8d483700 (LWP 26225) 0x0000003daa80e264 in 
>> __lll_lock_wait () from /lib64/libpthread.so.0
>>   9 Thread 0x7f3b8ca82700 (LWP 26226) 0x0000003daa80f4b5 in sigwait 
>> () from /lib64/libpthread.so.0
>>   8 Thread 0x7f3b8c081700 (LWP 26227) 0x0000003daa80b98e in 
>> pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>    from /lib64/libpthread.so.0
>>   7 Thread 0x7f3b8b680700 (LWP 26228) 0x0000003daa80b98e in 
>> pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>    from /lib64/libpthread.so.0
>>   6 Thread 0x7f3b8a854700 (LWP 26232) 0x0000003daa4e9163 in 
>> epoll_wait () from /lib64/libc.so.6
>>   5 Thread 0x7f3b89e53700 (LWP 26233) 0x0000003daa4e9163 in 
>> epoll_wait () from /lib64/libc.so.6
>>   4 Thread 0x7f3b833eb700 (LWP 26241) 0x0000003daa4df343 in poll () 
>> from /lib64/libc.so.6
>>   3 Thread 0x7f3b82130700 (LWP 26245) 0x0000003daa80e264 in 
>> __lll_lock_wait () from /lib64/libpthread.so.0
>>   2 Thread 0x7f3b8172f700 (LWP 26247) 0x0000003daa80e75d in read () 
>> from /lib64/libpthread.so.0
>> * 1 Thread 0x7f3b94a38700 (LWP 26224) 0x0000003daa80822d in 
>> pthread_join () from /lib64/libpthread.so.0
>>
>>
>> *(gdb) thread 3**
>> **[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0  
>> 0x0000003daa80e264 in __lll_lock_wait ()**
>> **   from /lib64/libpthread.so.0**
>> **(gdb) bt
>> #0  0x0000003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1  0x0000003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
>> #2  0x0000003daa8093d7 in pthread_mutex_lock () from 
>> /lib64/libpthread.so.0
>> #3  0x00007f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, 
>> req=0x7f3b8212f0b0) at socket.c:3134
>> *#4  0x00007f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, 
>> prog=<value optimized out>,
>>     procnum=<value optimized out>, cbkfn=0x7f3b892364b0 
>> <client3_3_lookup_cbk>, proghdr=0x7f3b8212f410,
>>     proghdrcount=1, progpayload=0x0, progpayloadcount=0, 
>> iobref=<value optimized out>, frame=0x7f3b93d2a454,
>>     rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
>> rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)
>>     at rpc-clnt.c:1556
>> #5  0x00007f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, 
>> req=<value optimized out>,
>>     frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, 
>> cbkfn=0x7f3b892364b0 <client3_3_lookup_cbk>, iobref=0x0,
>>     rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
>> rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,
>>     xdrproc=0x7f3b94a4ede0 <xdr_gfs3_lookup_req>) at client.c:243
>> #6  0x00007f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, 
>> this=0x7f3b7c005ef0, data=0x7f3b8212f660)
>>     at client-rpc-fops.c:3119
>>
>>
>> (gdb) p priv->lock
>> $1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 
>> 1, __kind = 0, __spins = 0, __list = {
>>       __prev = 0x0, __next = 0x0}},
>>   __size = "\002\000\000\000\000\000\000\000\201f\000\000\001", 
>> '\000' <repeats 26 times>, __align = 2}
>>
>>
>> *(gdb) thread 4
>> [Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0  
>> 0x0000003daa4df343 in poll () from /lib64/libc.so.6
>> (gdb) bt
>> #0  0x0000003daa4df343 in poll () from /lib64/libc.so.6
>> #1  0x00007f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, 
>> buf=0x7f3b7c051264, len=4, func=0x3db2441570 <SSL_read>)
>>     at socket.c:216
>> #2  0x00007f3b8aa7277b in __socket_ssl_readv (this=<value optimized 
>> out>, opvector=<value optimized out>,
>>     opcount=<value optimized out>) at socket.c:335
>> #3  0x00007f3b8aa72c26 in __socket_cached_read (this=<value optimized 
>> out>, vector=<value optimized out>,
>>     count=<value optimized out>, pending_vector=0x7f3b7c051258, 
>> pending_count=0x7f3b7c051260, bytes=0x0, write=0)
>>     at socket.c:422
>> #4  __socket_rwv (this=<value optimized out>, vector=<value optimized 
>> out>, count=<value optimized out>,
>>     pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, 
>> bytes=0x0, write=0) at socket.c:496
>> #5  0x00007f3b8aa76040 in __socket_readv (this=0x7f3b7c0505c0) at 
>> socket.c:589
>> #6  __socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:1966
>> #7  socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:2106
>> #8  socket_event_poll_in (this=0x7f3b7c0505c0) at socket.c:2127
>> #9  0x00007f3b8aa77820 in socket_poller (ctx=0x7f3b7c0505c0) at 
>> socket.c:2338
>> #10 0x0000003daa8079d1 in start_thread () from /lib64/libpthread.so.0
>> #11 0x0000003daa4e8b6d in clone () from /lib64/libc.so.6
>> *
>>
>> Thanks,
>> Vijay
>>
>>
>> On Tuesday 24 June 2014 08:59 AM, Raghavendra Gowdappa wrote:
>>> ok. Sorry, I didn't look into change #. I'll sync up with Vijay.
>>>
>>> ----- Original Message -----
>>>> From: "Anand Avati"<avati at redhat.com>
>>>> To: "Raghavendra Gowdappa"<rgowdapp at redhat.com>
>>>> Cc:vmallika at redhat.com
>>>> Sent: Tuesday, June 24, 2014 8:55:34 AM
>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
>>>>
>>>> On 6/23/14, 8:00 PM, Raghavendra Gowdappa wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Raghavendra Gowdappa"<rgowdapp at redhat.com>
>>>>>> To: "Anand Avati"<avati at redhat.com>
>>>>>> Cc:vmallika at redhat.com
>>>>>> Sent: Tuesday, June 24, 2014 8:28:41 AM
>>>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
>>>>>> FDs in a separate event pool
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Anand Avati"<avati at redhat.com>
>>>>>>> To:vmallika at redhat.com
>>>>>>> Cc: "Raghavendra G"<rgowdapp at redhat.com>
>>>>>>> Sent: Monday, June 23, 2014 10:07:19 PM
>>>>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
>>>>>>> FDs in a separate event pool
>>>>>>>
>>>>>>> On 6/22/14, 8:47 PM, Vijaikumar Mallikarjuna (Code Review) wrote:
>>>>>>>> Vijaikumar Mallikarjuna has posted comments on this change.
>>>>>>>>
>>>>>>>> Change subject: epoll: Handle client and server FDs in a separate event
>>>>>>>> pool
>>>>>>>> ......................................................................
>>>>>>>>
>>>>>>>>
>>>>>>>> Patch Set 9:
>>>>>>>>
>>>>>>>> Hi Avati,
>>>>>>>>
>>>>>>>> Actually we started working on the fix for Bug# 1096729 which was a
>>>>>>>> blocker
>>>>>>>> issue.
>>>>>>>> We tried multiple ways not to change the current epoll model for now,
>>>>>>>> however we had to do some changes in the epoll code and ended with this
>>>>>>>> patch.
>>>>>>>>
>>>>>>>>
>>>>>>>> MT patch# 3842 looks good to me. It will be great you can help us
>>>>>>>> getting
>>>>>>>> the patch in quickly.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vijay
>>>>>>>>
>>>>>>> Copying Raghavendra as he's the RPC guy. Du - #3842 is blocked in review
>>>>>>> for a long time because of some incompatibility with RPC SSL mode. Very
>>>>>>> likely some issue in our SSL multi-threading code. Can you help Vijai
>>>>>>> debug this and move #3842 forward? Also there are new SSL patches from
>>>>>>> Jeff upstream. Can you guys check if the new patches fix this problem?
>>>>>> Sure, I'll try to sync up with Vijay.
>>>>> However, I've a doubt on the approach we've to take. Doesn't your patch on
>>>>> multithreaded epoll also fix this issue? Given that yours is a generic
>>>>> solution, shouldn't it be favoured over this solution?
>>>>>
>>>> that's precisely what i meant.. #3824 (the more generic MT epoll) is
>>>> having some issues with SSL MT code (otherwise it is working fine)
>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140624/e49c1a3b/attachment.html>


More information about the Gluster-devel mailing list