[Gluster-devel] glusterfsd memory leak issue found after enable ssl

Zhou, Cynthia (NSB - CN/Hangzhou) cynthia.zhou at nokia-sbell.com
Mon Apr 22 06:30:45 UTC 2019


Ok ,another question, why priv->ssl_sbio = BIO_new_socket(priv->sock, BIO_NOCLOSE); use NOCLOSE mode instead of BIO_CLOSE?

cynthia

From: Milind Changire <mchangir at redhat.com>
Sent: Monday, April 22, 2019 2:21 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
Cc: Atin Mukherjee <amukherj at redhat.com>; gluster-devel at gluster.org
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

Looks like using BIO_free() is not right. Here's what the SSL_set_bio() man page says ...

SSL_set_bio() is similar to SSL_set0_rbio() and SSL_set0_wbio() except that it connects both the rbio and the wbio at the same time, and transfers the ownership of rbio and wbio to ssl according to the following set of rules:

So, I think you were right about SSL_free() doing the job for the bio. However, SSL_free() has no reason to set the priv->ssl_sbio pointer to NULL. I think priv->ssl_sbio should be set to NULL immediately after the call to SSL_set_bio() is successful. And we need to add a comment while setting priv->ssl_sbio to NULL that the ownership of the bio has now been transferred to SSL and SSL will free the related memory appropriately.



On Mon, Apr 22, 2019 at 11:37 AM Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>> wrote:
I tried to print priv->ssl_sbio after SSL_free() find this pointer is not null, so I add free ssl_sbio with BIO_free, however this cause glusterfd coredump
(gdb) bt
#0  0x00007f3047867f9b in raise () from /lib64/libc.so.6
#1  0x00007f3047869351 in abort () from /lib64/libc.so.6
#2  0x00007f30478aa8c7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f30478b0e6a in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f30478b2835 in _int_free () from /lib64/libc.so.6
#5  0x00007f3047c5bbbd in CRYPTO_free () from /lib64/libcrypto.so.10
#6  0x00007f3047d07582 in BIO_free () from /lib64/libcrypto.so.10
#7  0x00007f3043f9ba4b in __socket_reset (this=0x7f303c1ae710) at socket.c:1032
#8  0x00007f3043f9c4aa in socket_event_poll_err (this=0x7f303c1ae710, gen=1, idx=17) at socket.c:1232
#9  0x00007f3043fa1b7d in socket_event_handler (fd=26, idx=17, gen=1, data=0x7f303c1ae710, poll_in=1, poll_out=0, poll_err=0) at socket.c:2669
#10 0x00007f3049307984 in event_dispatch_epoll_handler (event_pool=0x1035610, event=0x7f3043b14e84) at event-epoll.c:587
#11 0x00007f3049307c5b in event_dispatch_epoll_worker (data=0x107e3e0) at event-epoll.c:663
#12 0x00007f30480535da in start_thread () from /lib64/libpthread.so.0
#13 0x00007f3047929eaf in clone () from /lib64/libc.so.6

@@ -1019,7 +1019,20 @@ static void __socket_reset(rpc_transport_t *this) {
   memset(&priv->incoming, 0, sizeof(priv->incoming));
   event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+  if(priv->use_ssl&& priv->ssl_ssl)
+  {
+    gf_log(this->name, GF_LOG_INFO,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+    SSL_shutdown(priv->ssl_ssl);
+    SSL_clear(priv->ssl_ssl);
+    SSL_free(priv->ssl_ssl);
+    gf_log(this->name, GF_LOG_INFO,"priv->ssl_sbio of socket(%d)is %p ",priv->sock,priv->ssl_sbio);
+     if(priv->ssl_sbio != NULL)
+         BIO_free(priv->ssl_sbio);
+    priv->ssl_ssl = NULL;
+     priv->ssl_sbio = NULL;
+  }
   priv->sock = -1;
   priv->idx = -1;
   priv->connected = -1;
@@ -4238,6 +4251,20 @@ void fini(rpc_transport_t *this) {
     pthread_mutex_destroy(&priv->out_lock);
     pthread_mutex_destroy(&priv->cond_lock);
     pthread_cond_destroy(&priv->cond);
+     if(priv->use_ssl&& priv->ssl_ssl)
+  {
+    gf_log(this->name, GF_LOG_INFO,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+    SSL_shutdown(priv->ssl_ssl);
+    SSL_clear(priv->ssl_ssl);
+    SSL_free(priv->ssl_ssl);
+     gf_log(this->name, GF_LOG_INFO,"priv->ssl_sbio of socket(%d)is %p ",priv->sock,priv->ssl_sbio);
+    if(priv->ssl_sbio != NULL)
+        BIO_free(priv->ssl_sbio);
+    priv->ssl_ssl = NULL;
+     priv->ssl_sbio = NULL;
+  }
     if (priv->ssl_private_key) {
       GF_FREE(priv->ssl_private_key);


From: Milind Changire <mchangir at redhat.com<mailto:mchangir at redhat.com>>
Sent: Monday, April 22, 2019 1:35 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>>
Cc: Atin Mukherjee <amukherj at redhat.com<mailto:amukherj at redhat.com>>; gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

This probably went unnoticed until now.



On Mon, Apr 22, 2019 at 10:45 AM Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>> wrote:
Why there is no bio_free called in ssl_teardown_connection then?

cynthia

From: Milind Changire <mchangir at redhat.com<mailto:mchangir at redhat.com>>
Sent: Monday, April 22, 2019 10:21 AM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>>
Cc: Atin Mukherjee <amukherj at redhat.com<mailto:amukherj at redhat.com>>; gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

After patch 22334<https://review.gluster.org/c/glusterfs/+/22334>, the priv->ssl_ctx is now maintained per socket connection and is no longer shared.
So you might want to SSL_free(priv->ssl_ctx) as well and set priv->ssl_ctx to NULL.

There might be some strings that are duplicated (gf_strdup()) via the socket_init() code path. Please take a look at those as well.

Sorry about that. I missed it.


On Mon, Apr 22, 2019 at 7:25 AM Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>> wrote:

Hi,
From my code study it seems priv->ssl_ssl is not properly released, I made a patch and the glusterfsd memory leak is alleviated with my patch, but some otherwhere is still leaking, I have no clue about the other leak points.

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -1019,7 +1019,16 @@ static void __socket_reset(rpc_transport_t *this) {
   memset(&priv->incoming, 0, sizeof(priv->incoming));
   event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+  if(priv->use_ssl&& priv->ssl_ssl)
+  {
+    gf_log(this->name, GF_LOG_TRACE,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+    SSL_shutdown(priv->ssl_ssl);
+    SSL_clear(priv->ssl_ssl);
+    SSL_free(priv->ssl_ssl);
+    priv->ssl_ssl = NULL;
+  }
   priv->sock = -1;
   priv->idx = -1;
   priv->connected = -1;
@@ -4238,6 +4250,16 @@ void fini(rpc_transport_t *this) {
     pthread_mutex_destroy(&priv->out_lock);
     pthread_mutex_destroy(&priv->cond_lock);
     pthread_cond_destroy(&priv->cond);
+     if(priv->use_ssl&& priv->ssl_ssl)
+  {
+    gf_log(this->name, GF_LOG_TRACE,
+           "clear and reset for socket(%d), free ssl ",
+           priv->sock);
+    SSL_shutdown(priv->ssl_ssl);
+    SSL_clear(priv->ssl_ssl);
+    SSL_free(priv->ssl_ssl);
+    priv->ssl_ssl = NULL;
+  }
     if (priv->ssl_private_key) {
       GF_FREE(priv->ssl_private_key);
     }


From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Thursday, April 18, 2019 5:31 PM
To: 'Atin Mukherjee' <amukherj at redhat.com<mailto:amukherj at redhat.com>>
Cc: 'Raghavendra Gowdappa' <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>; 'gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>' <gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>>
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

We scan it use memory-leak tool, there are following prints. We doubt some open ssl lib malloc is is not properly freed by glusterfs code.
er+0x2af [libglusterfs.so.0.0.1]\n\t\tstart_thread+0xda [libpthread-2.27.so<http://libpthread-2.27.so>]'
        13580 bytes in 175 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]'
        232904 bytes in 14 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]\n\t\t[unknown]'
[15:41:56] Top 10 stacks with outstanding allocations:
        8792 bytes in 14 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]\n\t\t[unknown]'
        9408 bytes in 42 allocations from stack
                b'CRYPTO_realloc+0x4d [libcrypto.so.1.0.2p]'
        9723 bytes in 14 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]\n\t\t[unknown]'
        10696 bytes in 21 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]\n\t\t[unknown]'
        11319 bytes in 602 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]\n\t\t[unknown]'
        11431 bytes in 518 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]\n\t\t[unknown]'
        11704 bytes in 371 allocations from stack
                b'CRYPTO_malloc+0x58 [libcrypto.so.1.0.2p]\n\t\t[unknown]'


cynthia
From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Thursday, April 18, 2019 5:25 PM
To: 'Atin Mukherjee' <amukherj at redhat.com<mailto:amukherj at redhat.com>>
Cc: Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>; gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

I’ve test on glusterfs3.12.15 and glusterfs5.5 all have this issue, after enable tls ssl socket, when execute gluster v heal <vol-name> info, will trigger glfshel to connect glusterfsd process, and cause glusterfsd process memory leak. Could you please try in your env?

cynthia


From: Atin Mukherjee <amukherj at redhat.com<mailto:amukherj at redhat.com>>
Sent: Thursday, April 18, 2019 1:19 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>>
Cc: Raghavendra Gowdappa <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>; gluster-devel at gluster.org<mailto:gluster-devel at gluster.org>
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl



On Wed, 17 Apr 2019 at 10:53, Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com<mailto:cynthia.zhou at nokia-sbell.com>> wrote:
Hi,
In my recent test, I found that there are very severe glusterfsd memory leak when enable socket ssl option

What gluster version are you testing? Would you be able to continue your investigation and share the root cause?

--
- Atin (atinm)
_______________________________________________
Gluster-devel mailing list
Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-devel


--
Milind


--
Milind


--
Milind
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190422/82f86b98/attachment-0001.html>


More information about the Gluster-devel mailing list