[Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
Vijaikumar M
vmallika at redhat.com
Tue Jun 24 12:04:09 UTC 2014
Hi Jeff,
Missed to add this:
SSL_pending was 0 before calling SSL_readand hence SSL_get_errorreturned
'SSL_ERROR_WANT_READ'
Thanks,
Vijay
On Tuesday 24 June 2014 05:15 PM, Vijaikumar M wrote:
> Hi Jeff,
>
> This is regarding the patch http://review.gluster.org/#/c/3842/
> (epoll: edge triggered and multi-threaded epoll).
> The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please
> find the stack trace below).
>
> In the code snippet below we found that 'SSL_pending' was returning 0.
> I have added a condition here to return from the function when there
> is no data available.
> Please suggest if this is OK to do this way or do we need to
> restructure this function for multi-threaded epoll?
>
> <code: socket.c>
> 178 static int
> 179 ssl_do (rpc_transport_t *this, void *buf, size_t len,
> SSL_trinary_func *func)
> 180 {
> ....
>
> 211 switch (SSL_get_error(priv->ssl_ssl,r)) {
> 212 case SSL_ERROR_NONE:
> 213 return r;
> 214 case SSL_ERROR_WANT_READ:
> 215 if (SSL_pending(priv->ssl_ssl) == 0)
> 216 return r;
> 217 pfd.fd = priv->sock;
> 221 if (poll(&pfd,1,-1) < 0) {
> </code>
>
>
>
> Thanks,
> Vijay
>
> On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
>> From the stack trace we found that function 'socket_submit_request'
>> is waiting on mutext_lock.
>> lock is held by the function 'ssl_do' and this function is blocked by
>> poll syscall.
>>
>>
>> (gdb) bt
>> #0 0x0000003daa80822d in pthread_join () from /lib64/libpthread.so.0
>> #1 0x00007f3b94eea9d0 in event_dispatch_epoll (event_pool=<value
>> optimized out>) at event-epoll.c:632
>> #2 0x0000000000407ecd in main (argc=4, argv=0x7fff160a4528) at
>> glusterfsd.c:2023
>>
>>
>> (gdb) info threads
>> 10 Thread 0x7f3b8d483700 (LWP 26225) 0x0000003daa80e264 in
>> __lll_lock_wait () from /lib64/libpthread.so.0
>> 9 Thread 0x7f3b8ca82700 (LWP 26226) 0x0000003daa80f4b5 in sigwait
>> () from /lib64/libpthread.so.0
>> 8 Thread 0x7f3b8c081700 (LWP 26227) 0x0000003daa80b98e in
>> pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> from /lib64/libpthread.so.0
>> 7 Thread 0x7f3b8b680700 (LWP 26228) 0x0000003daa80b98e in
>> pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> from /lib64/libpthread.so.0
>> 6 Thread 0x7f3b8a854700 (LWP 26232) 0x0000003daa4e9163 in
>> epoll_wait () from /lib64/libc.so.6
>> 5 Thread 0x7f3b89e53700 (LWP 26233) 0x0000003daa4e9163 in
>> epoll_wait () from /lib64/libc.so.6
>> 4 Thread 0x7f3b833eb700 (LWP 26241) 0x0000003daa4df343 in poll ()
>> from /lib64/libc.so.6
>> 3 Thread 0x7f3b82130700 (LWP 26245) 0x0000003daa80e264 in
>> __lll_lock_wait () from /lib64/libpthread.so.0
>> 2 Thread 0x7f3b8172f700 (LWP 26247) 0x0000003daa80e75d in read ()
>> from /lib64/libpthread.so.0
>> * 1 Thread 0x7f3b94a38700 (LWP 26224) 0x0000003daa80822d in
>> pthread_join () from /lib64/libpthread.so.0
>>
>>
>> *(gdb) thread 3**
>> **[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0
>> 0x0000003daa80e264 in __lll_lock_wait ()**
>> ** from /lib64/libpthread.so.0**
>> **(gdb) bt
>> #0 0x0000003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1 0x0000003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
>> #2 0x0000003daa8093d7 in pthread_mutex_lock () from
>> /lib64/libpthread.so.0
>> #3 0x00007f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0,
>> req=0x7f3b8212f0b0) at socket.c:3134
>> *#4 0x00007f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0,
>> prog=<value optimized out>,
>> procnum=<value optimized out>, cbkfn=0x7f3b892364b0
>> <client3_3_lookup_cbk>, proghdr=0x7f3b8212f410,
>> proghdrcount=1, progpayload=0x0, progpayloadcount=0,
>> iobref=<value optimized out>, frame=0x7f3b93d2a454,
>> rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0,
>> rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)
>> at rpc-clnt.c:1556
>> #5 0x00007f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0,
>> req=<value optimized out>,
>> frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27,
>> cbkfn=0x7f3b892364b0 <client3_3_lookup_cbk>, iobref=0x0,
>> rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0,
>> rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,
>> xdrproc=0x7f3b94a4ede0 <xdr_gfs3_lookup_req>) at client.c:243
>> #6 0x00007f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454,
>> this=0x7f3b7c005ef0, data=0x7f3b8212f660)
>> at client-rpc-fops.c:3119
>>
>>
>> (gdb) p priv->lock
>> $1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers =
>> 1, __kind = 0, __spins = 0, __list = {
>> __prev = 0x0, __next = 0x0}},
>> __size = "\002\000\000\000\000\000\000\000\201f\000\000\001",
>> '\000' <repeats 26 times>, __align = 2}
>>
>>
>> *(gdb) thread 4
>> [Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0
>> 0x0000003daa4df343 in poll () from /lib64/libc.so.6
>> (gdb) bt
>> #0 0x0000003daa4df343 in poll () from /lib64/libc.so.6
>> #1 0x00007f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0,
>> buf=0x7f3b7c051264, len=4, func=0x3db2441570 <SSL_read>)
>> at socket.c:216
>> #2 0x00007f3b8aa7277b in __socket_ssl_readv (this=<value optimized
>> out>, opvector=<value optimized out>,
>> opcount=<value optimized out>) at socket.c:335
>> #3 0x00007f3b8aa72c26 in __socket_cached_read (this=<value optimized
>> out>, vector=<value optimized out>,
>> count=<value optimized out>, pending_vector=0x7f3b7c051258,
>> pending_count=0x7f3b7c051260, bytes=0x0, write=0)
>> at socket.c:422
>> #4 __socket_rwv (this=<value optimized out>, vector=<value optimized
>> out>, count=<value optimized out>,
>> pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260,
>> bytes=0x0, write=0) at socket.c:496
>> #5 0x00007f3b8aa76040 in __socket_readv (this=0x7f3b7c0505c0) at
>> socket.c:589
>> #6 __socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:1966
>> #7 socket_proto_state_machine (this=0x7f3b7c0505c0) at socket.c:2106
>> #8 socket_event_poll_in (this=0x7f3b7c0505c0) at socket.c:2127
>> #9 0x00007f3b8aa77820 in socket_poller (ctx=0x7f3b7c0505c0) at
>> socket.c:2338
>> #10 0x0000003daa8079d1 in start_thread () from /lib64/libpthread.so.0
>> #11 0x0000003daa4e8b6d in clone () from /lib64/libc.so.6
>> *
>>
>> Thanks,
>> Vijay
>>
>>
>> On Tuesday 24 June 2014 08:59 AM, Raghavendra Gowdappa wrote:
>>> ok. Sorry, I didn't look into change #. I'll sync up with Vijay.
>>>
>>> ----- Original Message -----
>>>> From: "Anand Avati"<avati at redhat.com>
>>>> To: "Raghavendra Gowdappa"<rgowdapp at redhat.com>
>>>> Cc:vmallika at redhat.com
>>>> Sent: Tuesday, June 24, 2014 8:55:34 AM
>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
>>>>
>>>> On 6/23/14, 8:00 PM, Raghavendra Gowdappa wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Raghavendra Gowdappa"<rgowdapp at redhat.com>
>>>>>> To: "Anand Avati"<avati at redhat.com>
>>>>>> Cc:vmallika at redhat.com
>>>>>> Sent: Tuesday, June 24, 2014 8:28:41 AM
>>>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
>>>>>> FDs in a separate event pool
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Anand Avati"<avati at redhat.com>
>>>>>>> To:vmallika at redhat.com
>>>>>>> Cc: "Raghavendra G"<rgowdapp at redhat.com>
>>>>>>> Sent: Monday, June 23, 2014 10:07:19 PM
>>>>>>> Subject: Re: Change in glusterfs[master]: epoll: Handle client and server
>>>>>>> FDs in a separate event pool
>>>>>>>
>>>>>>> On 6/22/14, 8:47 PM, Vijaikumar Mallikarjuna (Code Review) wrote:
>>>>>>>> Vijaikumar Mallikarjuna has posted comments on this change.
>>>>>>>>
>>>>>>>> Change subject: epoll: Handle client and server FDs in a separate event
>>>>>>>> pool
>>>>>>>> ......................................................................
>>>>>>>>
>>>>>>>>
>>>>>>>> Patch Set 9:
>>>>>>>>
>>>>>>>> Hi Avati,
>>>>>>>>
>>>>>>>> Actually we started working on the fix for Bug# 1096729 which was a
>>>>>>>> blocker
>>>>>>>> issue.
>>>>>>>> We tried multiple ways not to change the current epoll model for now,
>>>>>>>> however we had to do some changes in the epoll code and ended with this
>>>>>>>> patch.
>>>>>>>>
>>>>>>>>
>>>>>>>> MT patch# 3842 looks good to me. It will be great you can help us
>>>>>>>> getting
>>>>>>>> the patch in quickly.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vijay
>>>>>>>>
>>>>>>> Copying Raghavendra as he's the RPC guy. Du - #3842 is blocked in review
>>>>>>> for a long time because of some incompatibility with RPC SSL mode. Very
>>>>>>> likely some issue in our SSL multi-threading code. Can you help Vijai
>>>>>>> debug this and move #3842 forward? Also there are new SSL patches from
>>>>>>> Jeff upstream. Can you guys check if the new patches fix this problem?
>>>>>> Sure, I'll try to sync up with Vijay.
>>>>> However, I've a doubt on the approach we've to take. Doesn't your patch on
>>>>> multithreaded epoll also fix this issue? Given that yours is a generic
>>>>> solution, shouldn't it be favoured over this solution?
>>>>>
>>>> that's precisely what i meant.. #3824 (the more generic MT epoll) is
>>>> having some issues with SSL MT code (otherwise it is working fine)
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140624/e49c1a3b/attachment.html>
More information about the Gluster-devel
mailing list