[Gluster-devel] Any review is appreciated. Reason about gluster server_connection_cleanup uncleanly, file flocks leaks in frequently network disconnection

Sat Sep 20 10:03:54 UTC 2014

Hi all,

Finally, I figure out the Gerrit and submit a BUG and patch.
Hope I had followed a correct procedures, :-)

Bug description:
https://bugzilla.redhat.com/show_bug.cgi?id=1144672

Patch review request:
http://review.gluster.org/#/c/8787/

Please review this patch when you are free to do it.
Thanks very much!

On Friday, September 19, 2014, Vijay Bellur <vbellur at redhat.com> wrote:

> Hi Jaden,
>
> Thanks for your effort on this one.
>
> A lot of GlusterFS developers including me are busy with various things
> over the next 2 weeks. Please do expect some delays in our responses and
> reviews till then.
>
> Cheers,
> Vijay
>
> On 09/19/2014 12:44 PM, Jaden Liang wrote:
>
>> Hi all,
>>
>> Here is a patch for this file flocks uncleanly disconnect issue of
>> gluster-3.4.5.
>> I am totally new guy in the gluster development work flow, and still
>> trying to
>> understand how to submit this patch to Gerrit. So I want to paste the
>> patch
>> here first to let devel team know, and submit it after I figure out the
>> Gerrit :-).
>>
>> The major modification is adding an id for different tcp connection
>> between a
>> pair client and server to avoid a connection socket not close at the
>> same time.
>>
>> diff --git a/rpc/rpc-lib/src/rpc-clnt.h b/rpc/rpc-lib/src/rpc-clnt.h
>> index 263d5f7..718308d 100644
>> --- a/rpc/rpc-lib/src/rpc-clnt.h
>> +++ b/rpc/rpc-lib/src/rpc-clnt.h
>> @@ -143,6 +143,7 @@ struct rpc_clnt_connection {
>> struct timeval           last_sent;
>> struct timeval           last_received;
>> int32_t                  ping_started;
>> +        uint32_t             clnt_conn_id;
>>   };
>>   typedef struct rpc_clnt_connection rpc_clnt_connection_t;
>> diff --git a/xlators/protocol/client/src/client-handshake.c
>> b/xlators/protocol/client/src/client-handshake.c
>> index d2083e6..1c2fc2f 100644
>> --- a/xlators/protocol/client/src/client-handshake.c
>> +++ b/xlators/protocol/client/src/client-handshake.c
>> @@ -471,9 +471,10 @@ client_set_lk_version (xlator_t *this)
>>           conf = (clnt_conf_t *) this->private;
>>           req.lk_ver = client_get_lk_ver (conf);
>> -        ret = gf_asprintf (&req.uid, "%s-%s-%d",
>> +        ret = gf_asprintf (&req.uid, "%s-%s-%d-%u",
>>                              this->ctx->process_uuid, this->name,
>> -                           this->graph->id);
>> +                           this->graph->id,
>> +                           (conf->rpc) ? conf->rpc->conn.clnt_conn_id :
>> 0);
>>           if (ret == -1)
>>                   goto err;
>> @@ -1549,13 +1550,22 @@ client_setvolume (xlator_t *this, struct
>> rpc_clnt *rpc)
>>                   }
>>           }
>> +        /* For different connections between a pair client and server,
>> we use a
>> +         * different clnt_conn_id to identify. Otherwise, there are
>> some chances
>> +         * lead to flocks not released in a uncleanly disconnection.
>> +         * */
>> +        if (conf->rpc) {
>> +                conf->rpc->conn.clnt_conn_id = conf->clnt_conn_id++;
>> +        }
>> +
>>           /* With multiple graphs possible in the same process, we need a
>>              field to bring the uniqueness. Graph-ID should be enough to
>> get the
>>              job done
>>           */
>> -        ret = gf_asprintf (&process_uuid_xl, "%s-%s-%d",
>> +        ret = gf_asprintf (&process_uuid_xl, "%s-%s-%d-%u",
>>                              this->ctx->process_uuid, this->name,
>> -                           this->graph->id);
>> +                           this->graph->id,
>> +                           (conf->rpc) ? conf->rpc->conn.clnt_conn_id :
>> 0);
>>           if (-1 == ret) {
>>                   gf_log (this->name, GF_LOG_ERROR,
>>                           "asprintf failed while setting process_uuid");
>> diff --git a/xlators/protocol/client/src/client.c
>> b/xlators/protocol/client/src/client.c
>> index ad95574..35fef49 100644
>> --- a/xlators/protocol/client/src/client.c
>> +++ b/xlators/protocol/client/src/client.c
>> @@ -2437,6 +2437,7 @@ init (xlator_t *this)
>>           conf->lk_version         = 1;
>>           conf->grace_timer        = NULL;
>>           conf->grace_timer_needed = _gf_true;
>> +        conf->clnt_conn_id       = 0;
>>           ret = client_init_grace_timer (this, this->options, conf);
>>           if (ret)
>> diff --git a/xlators/protocol/client/src/client.h
>> b/xlators/protocol/client/src/client.h
>> index 0a27c09..dea90d1 100644
>> --- a/xlators/protocol/client/src/client.h
>> +++ b/xlators/protocol/client/src/client.h
>> @@ -116,6 +116,9 @@ typedef struct clnt_conf {
>> */
>>           gf_boolean_t           filter_o_direct; /* if set, filter
>> O_DIRECT from
>>                                                      the flags list of
>> open() */
>> +        uint32_t               clnt_conn_id; /* connection id for each
>> connection
>> +                                                in process_uuid, start
>> with 0,
>> +                                                increase once a new
>> connection */
>>   } clnt_conf_t;
>>   typedef struct _client_fd_ctx {
>>
>>
>> On Wednesday, September 17, 2014, Jaden Liang <jaden1q84 at gmail.com
>> <mailto:jaden1q84 at gmail.com>> wrote:
>>
>>     Hi all,
>>
>>     By several days tracking, we finally pinpointed the reason of
>>     glusterfs uncleanly
>>     detach file flocks in frequently network disconnection. We are now
>>     working on
>>     a patch to submit. And here is this issue details. Any suggestions
>>     will be
>>     appreciated!
>>
>>     First of all, as I mentioned in
>>     http://supercolony.gluster.org/pipermail/gluster-devel/
>> 2014-September/042233.html
>>     This issue happens in a frequently network disconnection.
>>
>>     According to the sources, the server cleanup jobs is in
>>     server_connection_cleanup.
>>     When the RPCSVC_EVENT_DISCONNECT happens, it will come here:
>>
>>     int
>>     server_rpc_notify ()
>>     {
>>     ......
>>             case RPCSVC_EVENT_DISCONNECT:
>>     ......
>>                      if (!conf->lk_heal) {
>>                              server_conn_ref (conn);
>>                              server_connection_put (this, conn,
>> &detached);
>>                              if (detached)
>>                                      server_connection_cleanup (this,
>> conn,
>>
>>       INTERNAL_LOCKS |
>>
>>       POSIX_LOCKS);
>>                              server_conn_unref (conn);
>>     ......
>>     }
>>
>>     The server_connection_cleanup() will be called while variable
>>     'detached' is true.
>>     And the 'detached' is set by server_connection_put():
>>     server_connection_t*
>>     server_connection_put (xlator_t *this, server_connection_t *conn,
>>                             gf_boolean_t *detached)
>>     {
>>              server_conf_t       *conf = NULL;
>>              gf_boolean_t        unref = _gf_false;
>>
>>              if (detached)
>>                      *detached = _gf_false;
>>              conf = this->private;
>>              pthread_mutex_lock (&conf->mutex);
>>              {
>>                      conn->bind_ref--;
>>                      if (!conn->bind_ref) {
>>                              list_del_init (&conn->list);
>>                              unref = _gf_true;
>>                      }
>>              }
>>              pthread_mutex_unlock (&conf->mutex);
>>              if (unref) {
>>                      gf_log (this->name, GF_LOG_INFO, "Shutting down
>>     connection %s",
>>                              conn->id);
>>                      if (detached)
>>                              *detached = _gf_true;
>>                      server_conn_unref (conn);
>>                      conn = NULL;
>>              }
>>              return conn;
>>     }
>>
>>     The 'detached' is only set _gf_true when 'conn->bind_ref' decrease
>>     to 0.
>>     This 'conn->bind_ref' is set in server_connection_get(), increase or
>>     set to 1.
>>
>>     server_connection_t *
>>     server_connection_get (xlator_t *this, const char *id)
>>     {
>>     ......
>>                      list_for_each_entry (trav, &conf->conns, list) {
>>                              if (!strcmp (trav->id, id)) {
>>                                      conn = trav;
>>                                      conn->bind_ref++;
>>                                      goto unlock;
>>                              }
>>                      }
>>     ......
>>     }
>>
>>     When the connection id is same, then the 'conn->bind_ref' will be
>>     increased.
>>     Therefore, the problem should be a reference mismatch increase or
>>     decrease. Then
>>     we add some logs to verify our guess.
>>
>>     // 1st connection comes in. and there is no id
>>     'host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0'
>>     in the connection table. The 'conn->bind_ref' is set to 1.
>>     [2014-09-17 04:42:28.950693] D
>>     [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server:
>>     server connection id:
>>     host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0,
>>     conn->bind_ref:1, found:0
>>     [2014-09-17 04:42:28.950717] D
>>     [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server:
>>     Connected to
>>     host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0
>>     [2014-09-17 04:42:28.950758] I
>>     [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server:
>>     accepted client from
>>     host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0
>>     (version: 3.4.5) (peer: host-000c29e93d20:1015)
>>     ......
>>     // Keep running several minutes.......
>>     ......
>>     // Network disconnected here. The TCP socket of client side is
>>     disconnected by
>>     time-out, by the server-side socket still keep connected. AT THIS
>>     MOMENT,
>>     network restore. Client side reconnect a new TCP connection JUST
>>     BEFORE the
>>     last socket on server-side is reset. Note that at this point, there
>>     is 2 valid
>>     sockets on server side. The later new connection use the same conn
>>     id 'host-000
>>     c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0'
>>     look up in the
>>     connection table and increase the 'conn->bind_ref' to 2.
>>
>>     [2014-09-17 04:46:16.135066] D
>>     [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server:
>>     server connection id:
>>     host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0,
>>     conn->bind_ref:2, found:1 // HERE IT IS, ref increase to 2!!!
>>     [2014-09-17 04:46:16.135113] D
>>     [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server:
>>     Connected to
>>     host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0
>>     [2014-09-17 04:46:16.135157] I
>>     [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server:
>>     accepted client from
>>     host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0
>>     (version: 3.4.5) (peer: host-000c29e93d20:1018)
>>
>>     // After 13 seconds, the old connection is reset, decrease the
>>     'conn->bind_ref' to 1.
>>
>>     [2014-09-17 04:46:28.688780] W
>>     [socket.c:2121:__socket_proto_state_machine]
>>     0-tcp.vs_vol_rep2-server: ret = -1, error: Connection reset by peer,
>>     peer (host-000c29e93d20:1015)
>>     [2014-09-17 04:46:28.688790] I [socket.c:2274:socket_event_handler]
>>     0-transport: socket_event_poll_in failed, ret=-1.
>>     [2014-09-17 04:46:28.688797] D [socket.c:2281:socket_event_handler]
>>     0-transport: disconnecting now
>>     [2014-09-17 04:46:28.688831] I [server.c:762:server_rpc_notify]
>>     0-vs_vol_rep2-server: disconnecting connectionfrom
>>     host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_
>> rep2-client-2-0(host-000c29e93d20:1015)
>>     [2014-09-17 04:46:28.688861] D
>>     [server-helpers.c:744:server_connection_put] 0-vs_vol_rep2-server:
>>     conn->bind_ref:1
>>
>>     In our production environment, there is some flocks in the 1st
>>     connection.
>>     According to the logs, there is no way to cleanup the flocks in the
>>     1st connection.
>>     And the 2nd new connection, the client-side can't flock again.
>>
>>     Therefore, we think the major reason is different connections using
>>     the same conn id.
>>     The conn id is assembled in client_setvolume()
>>
>>     ret = gf_asprintf (&process_uuid_xl, "%s-%s-%d",
>>                                 this->ctx->process_uuid, this->name,
>>                                 this->graph->id);
>>
>>     The conn id contains 3 parts:
>>     this->ctx->process_uuid: hostname + pid + startup timestamp
>>     this->name: tranlator name
>>     this->graph->id: graph id
>>
>>     It is apparently that the conn id is same unless the client side
>>     restart. So when
>>     network disconnects, there is some chance that socket on client side
>>     timeout and
>>     the one on server side is alive. At this moment, network restore,
>>     client reconnect
>>     before server old socket reset, that will cause the file flocks of
>>     old connection
>>     unclean.
>>
>>     That is our total analysis of this flock leak issue. Now we are
>>     working on the patch.
>>     Hope someone could review it when it is finished.
>>
>>     Any other comment is grateful, Thank you!
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140920/548f3709/attachment-0001.html>