[Gluster-devel] glusterfsd memory leak issue found after enable ssl

Zhou, Cynthia (NSB - CN/Hangzhou) cynthia.zhou at nokia-sbell.com
Thu May 9 03:04:00 UTC 2019


Hi,
Ok, It is posted to https://review.gluster.org/#/c/glusterfs/+/22687/



From: Raghavendra Gowdappa <rgowdapp at redhat.com>
Sent: Wednesday, May 08, 2019 7:35 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
Cc: Amar Tumballi Suryanarayan <atumball at redhat.com>; gluster-devel at gluster.org
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl



On Wed, May 8, 2019 at 1:29 PM Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>> wrote:
Hi 'Milind Changire' ,
The leak is getting more and more clear to me now. the unsolved memory leak is because of in gluterfs version 3.12.15 (in my env)the ssl context is a shared one, while we do ssl_acept, ssl will allocate some read/write buffer to ssl object, however, ssl_free in socket_reset or fini function of socket.c, the buffer is returened back to ssl context free list instead of completely freed.

Thanks Cynthia for your efforts in identifying and fixing the leak. If you post a patch to gerrit, I'll be happy to merge it and get the fix into the codebase.


So following patch is able to fix the memory leak issue completely.(created for gluster master branch)

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -446,6 +446,7 @@ ssl_setup_connection_postfix(rpc_transport_t *this)
     gf_log(this->name, GF_LOG_DEBUG,
            "SSL verification succeeded (client: %s) (server: %s)",
            this->peerinfo.identifier, this->myinfo.identifier);
+    X509_free(peer);
     return gf_strdup(peer_CN);

     /* Error paths. */
@@ -1157,7 +1158,21 @@ __socket_reset(rpc_transport_t *this)
     memset(&priv->incoming, 0, sizeof(priv->incoming));

     event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+    if(priv->use_ssl&& priv->ssl_ssl)
+    {
+      gf_log(this->name, GF_LOG_TRACE,
+             "clear and reset for socket(%d), free ssl ",
+             priv->sock);
+               if(priv->ssl_ctx)
+                 {
+                       SSL_CTX_free(priv->ssl_ctx);
+                       priv->ssl_ctx = NULL;
+                 }
+      SSL_shutdown(priv->ssl_ssl);
+      SSL_clear(priv->ssl_ssl);
+      SSL_free(priv->ssl_ssl);
+      priv->ssl_ssl = NULL;
+    }
     priv->sock = -1;
     priv->idx = -1;
     priv->connected = -1;
@@ -4675,6 +4690,21 @@ fini(rpc_transport_t *this)
         pthread_mutex_destroy(&priv->out_lock);
         pthread_mutex_destroy(&priv->cond_lock);
         pthread_cond_destroy(&priv->cond);
+               if(priv->use_ssl&& priv->ssl_ssl)
+               {
+                 gf_log(this->name, GF_LOG_TRACE,
+                                "clear and reset for socket(%d), free ssl ",
+                                priv->sock);
+                 if(priv->ssl_ctx)
+                 {
+                       SSL_CTX_free(priv->ssl_ctx);
+                       priv->ssl_ctx = NULL;
+                 }
+                 SSL_shutdown(priv->ssl_ssl);
+                 SSL_clear(priv->ssl_ssl);
+                 SSL_free(priv->ssl_ssl);
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Monday, May 06, 2019 2:12 PM
To: 'Amar Tumballi Suryanarayan' <atumball at redhat.com<mailto:atumball at redhat.com>>
Cc: 'Milind Changire' <mchangir at redhat.com<mailto:mchangir at redhat.com>>; 'gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>' <gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>>
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

Hi,
From our test valgrind and libleak all blame ssl3_accept
///////////////////////////from valgrind attached to glusterfds///////////////////////////////////////////
==16673== 198,720 bytes in 12 blocks are definitely lost in loss record 1,114 of 1,123
==16673==    at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673==    by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673==    by 0xA855E0C: ssl3_setup_write_buffer (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA855E77: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673==    by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673==    by 0xA618420: socket_complete_connection (socket.c:2554)
==16673==    by 0xA618788: socket_event_handler (socket.c:2613)
==16673==    by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673==    by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673==    by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so<http://libpthread-2.27.so>)
==16673==
==16673== 200,544 bytes in 12 blocks are definitely lost in loss record 1,115 of 1,123
==16673==    at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673==    by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673==    by 0xA855D12: ssl3_setup_read_buffer (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA855E68: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673==    by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673==    by 0xA618420: socket_complete_connection (socket.c:2554)
==16673==    by 0xA618788: socket_event_handler (socket.c:2613)
==16673==    by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673==    by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673==    by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so<http://libpthread-2.27.so>)
==16673==
valgrind --leak-check=f


////////////////////////////////////with libleak attached to glusterfsd/////////////////////////////////////////
callstack[2419] expires. count=1 size=224/224 alloc=362 free=350
    /home/robot/libleak/libleak.so(malloc+0x25) [0x7f1460604065]
    /lib64/libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
    /lib64/libcrypto.so.10(EVP_DigestInit_ex+0x2a9) [0x7f145ed95749]
    /lib64/libssl.so.10(ssl3_digest_cached_records+0x11d) [0x7f145abb6ced]
    /lib64/libssl.so.10(ssl3_accept+0xc8f) [0x7f145abadc4f]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(ssl_complete_connection+0x5e) [0x7f145ae00f3a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc16d) [0x7f145ae0816d]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc68a) [0x7f145ae0868a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc9f2) [0x7f145ae089f2]
    /lib64/libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
    /lib64/libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
    /lib64/libpthread.so.0(+0x75da) [0x7f145f0d15da]
    /lib64/libc.so.6(clone+0x3f) [0x7f145e9a7eaf]
callstack[2432] expires. count=1 size=104/104 alloc=362 free=0
    /home/robot/libleak/libleak.so(malloc+0x25) [0x7f1460604065]
    /lib64/libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
    /lib64/libcrypto.so.10(BN_MONT_CTX_new+0x17) [0x7f145ed48627]
    /lib64/libcrypto.so.10(BN_MONT_CTX_set_locked+0x6d) [0x7f145ed489fd]
    /lib64/libcrypto.so.10(+0xff4d9) [0x7f145ed6a4d9]
    /lib64/libcrypto.so.10(int_rsa_verify+0x1cd) [0x7f145ed6d41d]
    /lib64/libcrypto.so.10(RSA_verify+0x32) [0x7f145ed6d972]
    /lib64/libcrypto.so.10(+0x107ff5) [0x7f145ed72ff5]
    /lib64/libcrypto.so.10(EVP_VerifyFinal+0x211) [0x7f145ed9dd51]
    /lib64/libssl.so.10(ssl3_get_cert_verify+0x5bb) [0x7f145abac06b]
    /lib64/libssl.so.10(ssl3_accept+0x988) [0x7f145abad948]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(ssl_complete_connection+0x5e) [0x7f145ae00f3a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc16d) [0x7f145ae0816d]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc68a) [0x7f145ae0868a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/socket.so(+0xc9f2) [0x7f145ae089f2]
    /lib64/libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
    /lib64/libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
    /lib64/libpthread.so.0(+0x75da) [0x7f145f0d15da]
    /lib64/libc.so.6(clone+0x3f) [0x7f145e9a7eaf]

one interesting thing is that the memory goes up to about 300m then it stopped  increasing !!!
I am wondering if this is caused by open-ssl library? But when I search from openssl community, there is no such issue reported before.
Is glusterfs using ssl_accept correctly?

cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Monday, May 06, 2019 10:34 AM
To: 'Amar Tumballi Suryanarayan' <atumball at redhat.com<mailto:atumball at redhat.com>>
Cc: Milind Changire <mchangir at redhat.com<mailto:mchangir at redhat.com>>; gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

Hi,
Sorry, I am so busy with other issues these days, could you help me to submit my patch for review? It is based on glusterfs3.12.15 code. But even with this patch , memory leak still exists, from memory leak tool it should be related with ssl_accept, not sure if it is because of openssl library or because improper use of ssl interfaces.
--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -1019,7 +1019,16 @@ static void __socket_reset(rpc_transport_t *this) {
   memset(&priv->incoming, 0, sizeof(priv->incoming));

   event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+  if(priv->use_ssl&& priv->ssl_ssl)
+  {
+    gf_log(this->name, GF_LOG_INFO,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+    SSL_shutdown(priv->ssl_ssl);
+    SSL_clear(priv->ssl_ssl);
+    SSL_free(priv->ssl_ssl);
+    priv->ssl_ssl = NULL;
+  }
   priv->sock = -1;
   priv->idx = -1;
   priv->connected = -1;
@@ -4238,6 +4250,16 @@ void fini(rpc_transport_t *this) {
     pthread_mutex_destroy(&priv->out_lock);
     pthread_mutex_destroy(&priv->cond_lock);
     pthread_cond_destroy(&priv->cond);
+     if(priv->use_ssl&& priv->ssl_ssl)
+  {
+    gf_log(this->name, GF_LOG_INFO,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+    SSL_shutdown(priv->ssl_ssl);
+    SSL_clear(priv->ssl_ssl);
+    SSL_free(priv->ssl_ssl);
+    priv->ssl_ssl = NULL;
+  }
     if (priv->ssl_private_key) {
       GF_FREE(priv->ssl_private_key);
     }


From: Amar Tumballi Suryanarayan <atumball at redhat.com<mailto:atumball at redhat.com>>
Sent: Wednesday, May 01, 2019 8:43 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>>
Cc: Milind Changire <mchangir at redhat.com<mailto:mchangir at redhat.com>>; gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

Hi Cynthia Zhou,

Can you post the patch which fixes the issue of missing free? We will continue to investigate the leak further, but would really appreciate getting the patch which is already worked on land into upstream master.

-Amar

On Mon, Apr 22, 2019 at 1:38 PM Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>> wrote:
Ok, I am clear now.
I’ve added ssl_free in socket reset and socket finish function, though glusterfsd memory leak is not that much, still it is leaking, from source code I can not find anything else,
Could you help to check if this issue exists in your env? If not I may have a try to merge your patch .
Step

1>   while true;do gluster v heal <vol-name> info,

2>   check the vol-name glusterfsd memory usage, it is obviously increasing.
cynthia

From: Milind Changire <mchangir at redhat.com<mailto:mchangir at redhat.com>>
Sent: Monday, April 22, 2019 2:36 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>>
Cc: Atin Mukherjee <amukherj at redhat.com<mailto:amukherj at redhat.com>>; gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

According to BIO_new_socket() man page ...

If the close flag is set then the socket is shut down and closed when the BIO is freed.

For Gluster to have more control over the socket shutdown, the BIO_NOCLOSE flag is set. Otherwise, SSL takes control of socket shutdown whenever BIO is freed.

_______________________________________________
Gluster-devel mailing list
Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-devel


--
Amar Tumballi (amarts)
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190509/72b33ef0/attachment-0001.html>


More information about the Gluster-devel mailing list