[Gluster-devel] glusterfsd memory leak issue found after enable ssl
Milind Changire
mchangir at redhat.com
Mon Apr 22 03:46:07 UTC 2019
On Mon, Apr 22, 2019 at 8:32 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.zhou at nokia-sbell.com> wrote:
> Hi,
>
> Thanks for your quick responding!
>
> My test env is without patch 22334, so priv->ssl_ctx is shared one. In
> socket_reset there is no need to free it. but if with patch 22334, it is
> absolutely needed to free priv->ssl_ctx as well.
>
> Following is the log captured from glusterfsd side while connection
> establishment, when accept returns, a new new_trans is allocated, and the
> socket_init input param is this new_trans, because this new_trans has no
> option now, so in socket_init ssl_setup_connection_params is not called,
> so there should be no malloc done here, as showed from following log,
> ssl_setup_connection_params is called in socket_event_handler, and here the
> ssl_ctx had been assigned to the shared one.
>
> This issue is very easy to reproduce in my env, just “while true; do
> gluster v heal <vol-name> info;done” and check the memory of the
> corresponding glusterfsd , it is very obvious increase.
>
> for pre-22334 code ssl_ctx free need not be done
>
> *One more thing to confirm is that there is no need to free the ssl_sbio
> right? Ssl_free will handle it and just call ssl_free is enough?*
>
You need to call BIO_free()
Check man bio.
>
>
>
> *[2019-04-21 05:02:17.820346] T
> [socket.c:2776:socket_server_event_handler] 0-tcp.ccs-server: XXX
> server:192.168.1.13:53952 <http://192.168.1.13:53952>,
> client:192.168.1.13:63683 <http://192.168.1.13:63683>*
>
> *[2019-04-21 05:02:17.820370] T
> [socket.c:2820:socket_server_event_handler] 0-tcp.ccs-server: ### use
> non-blocking IO*
>
> [2019-04-21 05:02:17.820410] T [socket.c:2603:socket_event_handler]
> 0-tcp.ccs-server: server (sock:126) in:1, out:0, err:0
>
> [2019-04-21 05:02:17.820431] T [socket.c:2610:socket_event_handler]
> 0-tcp.ccs-server: server (sock:126) socket is not connected, completing
> connection
>
> [2019-04-21 05:02:17.820442] T [socket.c:3829:
> *ssl_setup_connection_params*] 0-tcp.ccs-server: *found old SSL
> context! // context is shared between listening socket and accepted
> socket*
>
> [2019-04-21 05:02:17.820451] T [socket.c:310:ssl_setup_connection_prefix]
> 0-tcp.ccs-server: + ssl_setup_connection_params() done!
>
> [2019-04-21 05:02:17.822559] T [socket.c:2617:socket_event_handler]
> 0-tcp.ccs-server: (sock:126) socket_complete_connection() returned 1
>
> [2019-04-21 05:02:17.822585] T [socket.c:2621:socket_event_handler]
> 0-tcp.ccs-server: (sock:126) returning to wait on socket
>
> [2019-04-21 05:02:17.828455] T [socket.c:2603:socket_event_handler]
> 0-tcp.ccs-server: server (sock:126) in:1, out:0, err:0
>
> [2019-04-21 05:02:17.828483] T [socket.c:2610:socket_event_handler]
> 0-tcp.ccs-server: server (sock:126) socket is not connected, completing
> connection
>
> [2019-04-21 05:02:17.829130] D [socket.c:366:ssl_setup_connection_postfix]
> 0-tcp.ccs-server: peer CN = example ee certificate
>
> [2019-04-21 05:02:17.829157] D [socket.c:369:ssl_setup_connection_postfix]
> 0-tcp.ccs-server: SSL verification succeeded (client: 192.168.1.13:63683)
> (server: 192.168.1.13:53952)
>
> [2019-04-21 05:02:17.829171] T [socket.c:423:ssl_complete_connection]
> 0-tcp.ccs-server: ssl_accepted!
>
> [2019-04-21 05:02:17.829183] T [socket.c:2617:socket_event_handler]
> 0-tcp.ccs-server: (sock:126) socket_complete_connection() returned 1
>
> [2019-04-21 05:02:17.829192] T [socket.c:2621:socket_event_handler]
> 0-tcp.ccs-server: (sock:126) returning to wait on socket
>
> [2019-04-21 05:02:17.829261] T [socket.c:2603:socket_event_handler]
> 0-tcp.ccs-server: server (sock:126) in:1, out:0, err:0
>
> [2019-04-21 05:02:17.829282] T [socket.c:2628:socket_event_handler]
> 0-tcp.ccs-server: Server socket (126) is already connected
>
> [2019-04-21 05:02:17.829294] T [socket.c:493:__socket_ssl_readv]
> 0-tcp.ccs-server: ***** reading over SSL
>
> [2019-04-21 05:02:17.829311] T [socket.c:493:__socket_ssl_readv]
> 0-tcp.ccs-server: ***** reading over SSL
>
> [2019-04-21 05:02:17.829337] T [socket.c:493:__socket_ssl_readv]
> 0-tcp.ccs-server: ***** reading over SSL
>
>
>
>
>
> cynthia
>
> *From:* Milind Changire <mchangir at redhat.com>
> *Sent:* Monday, April 22, 2019 10:21 AM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Atin Mukherjee <amukherj at redhat.com>; gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> After patch 22334 <https://review.gluster.org/c/glusterfs/+/22334>, the
> priv->ssl_ctx is now maintained per socket connection and is no longer
> shared.
>
> So you might want to SSL_free(priv->ssl_ctx) as well and set priv->ssl_ctx
> to NULL.
>
>
>
> There might be some strings that are duplicated (gf_strdup()) via the
> socket_init() code path. Please take a look at those as well.
>
>
>
> Sorry about that. I missed it.
>
>
>
>
>
> On Mon, Apr 22, 2019 at 7:25 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com> wrote:
>
>
>
> Hi,
>
> From my code study it seems priv->ssl_ssl is not properly released, I made
> a patch and the glusterfsd memory leak is alleviated with my patch, but
> some otherwhere is still leaking, I have no clue about the other leak
> points.
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
>
> +++ b/rpc/rpc-transport/socket/src/socket.c
>
> @@ -1019,7 +1019,16 @@ static void __socket_reset(rpc_transport_t *this) {
>
> memset(&priv->incoming, 0, sizeof(priv->incoming));
>
> event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
>
> -
>
> + if(priv->use_ssl&& priv->ssl_ssl)
>
> + {
>
> + gf_log(this->name, GF_LOG_TRACE,
>
> + "clear and reset for socket(%d), free ssl ",
>
> + priv->sock);
>
> + SSL_shutdown(priv->ssl_ssl);
>
> + SSL_clear(priv->ssl_ssl);
>
> + SSL_free(priv->ssl_ssl);
>
> + priv->ssl_ssl = NULL;
>
> + }
>
> priv->sock = -1;
>
> priv->idx = -1;
>
> priv->connected = -1;
>
> @@ -4238,6 +4250,16 @@ void fini(rpc_transport_t *this) {
>
> pthread_mutex_destroy(&priv->out_lock);
>
> pthread_mutex_destroy(&priv->cond_lock);
>
> pthread_cond_destroy(&priv->cond);
>
> + if(priv->use_ssl&& priv->ssl_ssl)
>
> + {
>
> + gf_log(this->name, GF_LOG_TRACE,
>
> + "clear and reset for socket(%d), free ssl ",
>
> + priv->sock);
>
> + SSL_shutdown(priv->ssl_ssl);
>
> + SSL_clear(priv->ssl_ssl);
>
> + SSL_free(priv->ssl_ssl);
>
> + priv->ssl_ssl = NULL;
>
> + }
>
> if (priv->ssl_private_key) {
>
> GF_FREE(priv->ssl_private_key);
>
> }
>
>
>
>
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Thursday, April 18, 2019 5:31 PM
> *To:* 'Atin Mukherjee' <amukherj at redhat.com>
> *Cc:* 'Raghavendra Gowdappa' <rgowdapp at redhat.com>; '
> gluster-devel at gluster.org' <gluster-devel at gluster.org>
> *Subject:* RE: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> We scan it use memory-leak tool, there are following prints. We doubt some
> open ssl lib malloc is is not properly freed by glusterfs code.
>
> er+0x2af [*libglusterfs.so.0.0.1]\n\t\tstart_thread+0xda* [*libpthread-2.27.so
> <http://libpthread-2.27.so>*]'
> *13580* bytes in 175 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p*]'
> *232904* bytes in 14 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p]\n\t\t[unknown*
> ]'
> [15:41:56] Top 10 stacks with outstanding allocations:
> *8792* bytes in 14 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p]\n\t\t[unknown*
> ]'
> *9408* bytes in 42 allocations from stack
> b'CRYPTO_realloc+0x4d [*libcrypto.so.1.0.2p*]'
> *9723* bytes in 14 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p]\n\t\t[unknown*
> ]'
> *10696* bytes in 21 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p]\n\t\t[unknown*
> ]'
> *11319* bytes in 602 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p]\n\t\t[unknown*
> ]'
> *11431* bytes in 518 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p]\n\t\t[unknown*
> ]'
> *11704* bytes in 371 allocations from stack
> b'CRYPTO_malloc+0x58 [*libcrypto.so.1.0.2p]\n\t\t[unknown*
> ]'
>
>
>
>
>
> cynthia
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Thursday, April 18, 2019 5:25 PM
> *To:* 'Atin Mukherjee' <amukherj at redhat.com>
> *Cc:* Raghavendra Gowdappa <rgowdapp at redhat.com>;
> gluster-devel at gluster.org
> *Subject:* RE: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> I’ve test on glusterfs3.12.15 and glusterfs5.5 all have this issue, after
> enable tls ssl socket, when execute gluster v heal <vol-name> info, will
> trigger glfshel to connect glusterfsd process, and cause glusterfsd process
> memory leak. Could you please try in your env?
>
>
>
> cynthia
>
>
>
>
>
> *From:* Atin Mukherjee <amukherj at redhat.com>
> *Sent:* Thursday, April 18, 2019 1:19 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>
> *Cc:* Raghavendra Gowdappa <rgowdapp at redhat.com>;
> gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
>
>
>
>
> On Wed, 17 Apr 2019 at 10:53, Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com> wrote:
>
> Hi,
>
> In my recent test, I found that there are very severe glusterfsd memory
> leak when enable socket ssl option
>
>
>
> What gluster version are you testing? Would you be able to continue your
> investigation and share the root cause?
>
>
>
> --
>
> - Atin (atinm)
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> --
>
> Milind
>
--
Milind
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190422/9717e4a8/attachment-0001.html>
More information about the Gluster-devel
mailing list