[Gluster-devel] glusterfsd memory leak issue found after enable ssl

Yaniv Kaul ykaul at redhat.com
Mon Jun 10 08:31:18 UTC 2019


On Mon, Jun 10, 2019 at 10:43 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.zhou at nokia-sbell.com> wrote:

> Hi,
>
> How about this patch? I see there is a failed test, is that related to my
> change?
>

Quite likely. Have you looked at the failure? It produces a stack which
looks close to where your patch is:
01:02:58.118 Thread 1 (Thread 0x7efe40930700 (LWP 17150)):
01:02:58.118 #0  0x00007efe4dfd359c in free () from /lib64/libc.so.6
01:02:58.118 No symbol table info available.
01:02:58.118 #1  0x00007efe4e38970d in CRYPTO_free () from
/lib64/libcrypto.so.10
01:02:58.118 No symbol table info available.
01:02:58.118 #2  0x00007efe4e4400e7 in sk_free () from
/lib64/libcrypto.so.10
01:02:58.118 No symbol table info available.
01:02:58.118 #3  0x00007efe4e4863de in x509_verify_param_zero () from
/lib64/libcrypto.so.10
01:02:58.118 No symbol table info available.
01:02:58.118 #4  0x00007efe4e48644e in X509_VERIFY_PARAM_free () from
/lib64/libcrypto.so.10
01:02:58.118 No symbol table info available.
01:02:58.118 #5  0x00007efe42a107d9 in SSL_CTX_free () from
/lib64/libssl.so.10
01:02:58.120 No symbol table info available.
01:02:58.120 #6  0x00007efe42a12cc0 in SSL_free () from /lib64/libssl.so.10
01:02:58.122 No symbol table info available.
01:02:58.122 #7  0x00007efe42c463eb in __socket_reset (this=0x7efe34001240)
at
/home/jenkins/root/workspace/centos7-regression/rpc/rpc-transport/socket/src/socket.c:1170
01:02:58.123         priv = 0x7efe340017a0
01:02:58.123         __FUNCTION__ = "__socket_reset"
01:02:58.123 #8  0x00007efe42c46e43 in socket_event_poll_err
(this=0x7efe34001240, gen=4, idx=2) at
/home/jenkins/root/workspace/centos7-regression/rpc/rpc-transport/socket/src/socket.c:1383
01:02:58.123         priv = 0x7efe340017a0
01:02:58.123         socket_closed = false
01:02:58.123         __FUNCTION__ = "socket_event_poll_err"
01:02:58.123 #9  0x00007efe42c4d056 in socket_event_handler (fd=6, idx=2,
gen=4, data=0x7efe34001240, poll_in=1, poll_out=0, poll_err=16,
event_thread_died=0 '\000') at
/home/jenkins/root/workspace/centos7-regression/rpc/rpc-transport/socket/src/socket.c:3037


>
> cynthia
>
>
>
> *From:* Raghavendra Gowdappa <rgowdapp at redhat.com>
> *Sent:* Thursday, May 09, 2019 12:13 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Amar Tumballi Suryanarayan <atumball at redhat.com>;
> gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> Thanks!!
>
>
>
> On Thu, May 9, 2019 at 8:34 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com> wrote:
>
> Hi,
>
> Ok, It is posted to https://review.gluster.org/#/c/glusterfs/+/22687/
>
>
>
>
>
>
>
> *From:* Raghavendra Gowdappa <rgowdapp at redhat.com>
> *Sent:* Wednesday, May 08, 2019 7:35 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Amar Tumballi Suryanarayan <atumball at redhat.com>;
> gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
>
>
>
>
> On Wed, May 8, 2019 at 1:29 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com> wrote:
>
> Hi 'Milind Changire' ,
>
> The leak is getting more and more clear to me now. the unsolved memory
> leak is because of in gluterfs version 3.12.15 (in my env)the ssl context
> is a shared one, while we do ssl_acept, ssl will allocate some read/write
> buffer to ssl object, however, ssl_free in socket_reset or fini function of
> socket.c, the buffer is returened back to ssl context free list instead of
> completely freed.
>
>
>
> Thanks Cynthia for your efforts in identifying and fixing the leak. If you
> post a patch to gerrit, I'll be happy to merge it and get the fix into the
> codebase.
>
>
>
>
>
> So following patch is able to fix the memory leak issue
> completely.(created for gluster master branch)
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
> +++ b/rpc/rpc-transport/socket/src/socket.c
> @@ -446,6 +446,7 @@ ssl_setup_connection_postfix(rpc_transport_t *this)
>      gf_log(this->name, GF_LOG_DEBUG,
>             "SSL verification succeeded (client: %s) (server: %s)",
>             this->peerinfo.identifier, this->myinfo.identifier);
> +    X509_free(peer);
>      return gf_strdup(peer_CN);
>
>      /* Error paths. */
> @@ -1157,7 +1158,21 @@ __socket_reset(rpc_transport_t *this)
>      memset(&priv->incoming, 0, sizeof(priv->incoming));
>
>      event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
> -
> +    if(priv->use_ssl&& priv->ssl_ssl)
> +    {
> +      gf_log(this->name, GF_LOG_TRACE,
> +             "clear and reset for socket(%d), free ssl ",
> +             priv->sock);
> +               if(priv->ssl_ctx)
> +                 {
> +                       SSL_CTX_free(priv->ssl_ctx);
> +                       priv->ssl_ctx = NULL;
> +                 }
> +      SSL_shutdown(priv->ssl_ssl);
> +      SSL_clear(priv->ssl_ssl);
> +      SSL_free(priv->ssl_ssl);
> +      priv->ssl_ssl = NULL;
> +    }
>      priv->sock = -1;
>      priv->idx = -1;
>      priv->connected = -1;
> @@ -4675,6 +4690,21 @@ fini(rpc_transport_t *this)
>          pthread_mutex_destroy(&priv->out_lock);
>          pthread_mutex_destroy(&priv->cond_lock);
>          pthread_cond_destroy(&priv->cond);
> +               if(priv->use_ssl&& priv->ssl_ssl)
> +               {
> +                 gf_log(this->name, GF_LOG_TRACE,
> +                                "clear and reset for socket(%d), free ssl
> ",
> +                                priv->sock);
> +                 if(priv->ssl_ctx)
> +                 {
> +                       SSL_CTX_free(priv->ssl_ctx);
> +                       priv->ssl_ctx = NULL;
> +                 }
> +                 SSL_shutdown(priv->ssl_ssl);
> +                 SSL_clear(priv->ssl_ssl);
> +                 SSL_free(priv->ssl_ssl);
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Monday, May 06, 2019 2:12 PM
> *To:* 'Amar Tumballi Suryanarayan' <atumball at redhat.com>
> *Cc:* 'Milind Changire' <mchangir at redhat.com>; 'gluster-devel at gluster.org'
> <gluster-devel at gluster.org>
> *Subject:* RE: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> Hi,
>
> From our test valgrind and libleak all blame ssl3_accept
>
> ///////////////////////////from valgrind attached to
> glusterfds///////////////////////////////////////////
>
> ==16673== 198,720 bytes in 12 blocks are definitely lost in loss record
> 1,114 of 1,123
> ==16673==    at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
> ==16673==    by 0x63E1977: CRYPTO_malloc (in /usr/lib64/
> *libcrypto.so.1.0.2p*)
> ==16673==    by 0xA855E0C: ssl3_setup_write_buffer (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==    by 0xA855E77: ssl3_setup_buffers (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==    by 0xA8485D9: ssl3_accept (in /usr/lib64/*libssl.so.1.0.2p*)
> ==16673==    by 0xA610DDF: ssl_complete_connection (socket.c:400)
> ==16673==    by 0xA617F38: ssl_handle_server_connection_attempt
> (socket.c:2409)
> ==16673==    by 0xA618420: socket_complete_connection (socket.c:2554)
> ==16673==    by 0xA618788: socket_event_handler (socket.c:2613)
> ==16673==    by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
> ==16673==    by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
> ==16673==    by 0x615C5D9: start_thread (in /usr/lib64/*libpthread-2.27.so
> <http://libpthread-2.27.so>*)
> ==16673==
> ==16673== 200,544 bytes in 12 blocks are definitely lost in loss record
> 1,115 of 1,123
> ==16673==    at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
> ==16673==    by 0x63E1977: CRYPTO_malloc (in /usr/lib64/
> *libcrypto.so.1.0.2p*)
> ==16673==    by 0xA855D12: ssl3_setup_read_buffer (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==    by 0xA855E68: ssl3_setup_buffers (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==    by 0xA8485D9: ssl3_accept (in /usr/lib64/*libssl.so.1.0.2p*)
> ==16673==    by 0xA610DDF: ssl_complete_connection (socket.c:400)
> ==16673==    by 0xA617F38: ssl_handle_server_connection_attempt
> (socket.c:2409)
> ==16673==    by 0xA618420: socket_complete_connection (socket.c:2554)
> ==16673==    by 0xA618788: socket_event_handler (socket.c:2613)
> ==16673==    by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
> ==16673==    by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
> ==16673==    by 0x615C5D9: start_thread (in /usr/lib64/*libpthread-2.27.so
> <http://libpthread-2.27.so>*)
> ==16673==
> valgrind --leak-check=f
>
>
>
>
>
> ////////////////////////////////////with libleak attached to
> glusterfsd/////////////////////////////////////////
>
> callstack[2419] expires. count=1 size=224/224 alloc=362 free=350
>     /home/robot/libleak/*libleak.so(malloc+0x25*) [0x7f1460604065]
>     /lib64/*libcrypto.so.10(CRYPTO_malloc+0x58*) [0x7f145ecd9978]
>     /lib64/*libcrypto.so.10(EVP_DigestInit_ex+0x2a9*) [0x7f145ed95749]
>     /lib64/*libssl.so.10(ssl3_digest_cached_records+0x11d*)
> [0x7f145abb6ced]
>     /lib64/*libssl.so.10(**ssl3_accept**+0xc8f*) [0x7f145abadc4f]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/
> *socket.so(ssl_complete_connection+0x5e*) [0x7f145ae00f3a]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/*socket.so(+0xc16d*)
> [0x7f145ae0816d]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/*socket.so(+0xc68a*)
> [0x7f145ae0868a]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/*socket.so(+0xc9f2*)
> [0x7f145ae089f2]
>     /lib64/*libglusterfs.so.0(+0x9b96f*) [0x7f146038596f]
>     /lib64/*libglusterfs.so.0(+0x9bc46*) [0x7f1460385c46]
>     /lib64/*libpthread.so.0(+0x75da*) [0x7f145f0d15da]
>     /lib64/*libc.so.6(clone+0x3f*) [0x7f145e9a7eaf]
>
> callstack[2432] expires. count=1 size=104/104 alloc=362 free=0
>     /home/robot/libleak/*libleak.so(malloc+0x25*) [0x7f1460604065]
>     /lib64/*libcrypto.so.10(CRYPTO_malloc+0x58*) [0x7f145ecd9978]
>     /lib64/*libcrypto.so.10(BN_MONT_CTX_new+0x17*) [0x7f145ed48627]
>     /lib64/*libcrypto.so.10(BN_MONT_CTX_set_locked+0x6d*) [0x7f145ed489fd]
>     /lib64/*libcrypto.so.10(+0xff4d9*) [0x7f145ed6a4d9]
>     /lib64/*libcrypto.so.10(int_rsa_verify+0x1cd*) [0x7f145ed6d41d]
>     /lib64/*libcrypto.so.10(RSA_verify+0x32*) [0x7f145ed6d972]
>     /lib64/*libcrypto.so.10(+0x107ff5*) [0x7f145ed72ff5]
>     /lib64/*libcrypto.so.10(EVP_VerifyFinal+0x211*) [0x7f145ed9dd51]
>     /lib64/*libssl.so.10(ssl3_get_cert_verify+0x5bb*) [0x7f145abac06b]
>     /lib64/*libssl.so.10(**ssl3_accept**+0x988*) [0x7f145abad948]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/
> *socket.so(ssl_complete_connection+0x5e*) [0x7f145ae00f3a]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/*socket.so(+0xc16d*)
> [0x7f145ae0816d]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/*socket.so(+0xc68a*)
> [0x7f145ae0868a]
>     /usr/lib64/glusterfs/3.12.15/rpc-transport/*socket.so(+0xc9f2*)
> [0x7f145ae089f2]
>     /lib64/*libglusterfs.so.0(+0x9b96f*) [0x7f146038596f]
>     /lib64/*libglusterfs.so.0(+0x9bc46*) [0x7f1460385c46]
>     /lib64/*libpthread.so.0(+0x75da*) [0x7f145f0d15da]
>     /lib64/*libc.so.6(clone+0x3f*) [0x7f145e9a7eaf]
>
>
>
> one interesting thing is that the memory goes up to about 300m then it
> stopped  increasing !!!
>
> I am wondering if this is caused by open-ssl library? But when I search
> from openssl community, there is no such issue reported before.
>
> Is glusterfs using ssl_accept correctly?
>
>
>
> cynthia
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Monday, May 06, 2019 10:34 AM
> *To:* 'Amar Tumballi Suryanarayan' <atumball at redhat.com>
> *Cc:* Milind Changire <mchangir at redhat.com>; gluster-devel at gluster.org
> *Subject:* RE: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> Hi,
>
> Sorry, I am so busy with other issues these days, could you help me to
> submit my patch for review? It is based on glusterfs3.12.15 code. But even
> with this patch , memory leak still exists, from memory leak tool it should
> be related with ssl_accept, not sure if it is because of openssl library or
> because improper use of ssl interfaces.
>
> --- a/rpc/rpc-transport/socket/src/socket.c
>
> +++ b/rpc/rpc-transport/socket/src/socket.c
>
> @@ -1019,7 +1019,16 @@ static void __socket_reset(rpc_transport_t *this) {
>
>    memset(&priv->incoming, 0, sizeof(priv->incoming));
>
>
>
>    event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
>
> -
>
> +  if(priv->use_ssl&& priv->ssl_ssl)
>
> +  {
>
> +    gf_log(this->name, GF_LOG_INFO,
>
> +           "clear and reset for socket(%d), free ssl ",
>
> +           priv->sock);
>
> +    SSL_shutdown(priv->ssl_ssl);
>
> +    SSL_clear(priv->ssl_ssl);
>
> +    SSL_free(priv->ssl_ssl);
>
> +    priv->ssl_ssl = NULL;
>
> +  }
>
>    priv->sock = -1;
>
>    priv->idx = -1;
>
>    priv->connected = -1;
>
> @@ -4238,6 +4250,16 @@ void fini(rpc_transport_t *this) {
>
>      pthread_mutex_destroy(&priv->out_lock);
>
>      pthread_mutex_destroy(&priv->cond_lock);
>
>      pthread_cond_destroy(&priv->cond);
>
> +     if(priv->use_ssl&& priv->ssl_ssl)
>
> +  {
>
> +    gf_log(this->name, GF_LOG_INFO,
>
> +           "clear and reset for socket(%d), free ssl ",
>
> +           priv->sock);
>
> +    SSL_shutdown(priv->ssl_ssl);
>
> +    SSL_clear(priv->ssl_ssl);
>
> +    SSL_free(priv->ssl_ssl);
>
> +    priv->ssl_ssl = NULL;
>
> +  }
>
>      if (priv->ssl_private_key) {
>
>        GF_FREE(priv->ssl_private_key);
>
>      }
>
>
>
>
>
> *From:* Amar Tumballi Suryanarayan <atumball at redhat.com>
> *Sent:* Wednesday, May 01, 2019 8:43 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Milind Changire <mchangir at redhat.com>; gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> Hi Cynthia Zhou,
>
>
>
> Can you post the patch which fixes the issue of missing free? We will
> continue to investigate the leak further, but would really appreciate
> getting the patch which is already worked on land into upstream master.
>
>
>
> -Amar
>
>
>
> On Mon, Apr 22, 2019 at 1:38 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com> wrote:
>
> Ok, I am clear now.
>
> I’ve added ssl_free in socket reset and socket finish function, though
> glusterfsd memory leak is not that much, still it is leaking, from source
> code I can not find anything else,
>
> Could you help to check if this issue exists in your env? If not I may
> have a try to merge your patch .
>
> Step
>
> 1>   while true;do gluster v heal <vol-name> info,
>
> 2>   check the vol-name glusterfsd memory usage, it is obviously
> increasing.
>
> cynthia
>
>
>
> *From:* Milind Changire <mchangir at redhat.com>
> *Sent:* Monday, April 22, 2019 2:36 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Atin Mukherjee <amukherj at redhat.com>; gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> According to BIO_new_socket() man page ...
>
>
>
> *If the close flag is set then the socket is shut down and closed when the
> BIO is freed.*
>
>
>
> For Gluster to have more control over the socket shutdown, the BIO_NOCLOSE
> flag is set. Otherwise, SSL takes control of socket shutdown whenever BIO
> is freed.
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> --
>
> Amar Tumballi (amarts)
>
> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190610/692c451e/attachment-0001.html>


More information about the Gluster-devel mailing list