[Gluster-devel] Question on rpc_transport_unref

Emmanuel Dreyfus manu at netbsd.org
Sun Apr 28 13:24:38 UTC 2013


Emmanuel Dreyfus <manu at netbsd.org> wrote:

> Is it an event-poll story? When the backtrace hits even-pool.c,
> glusterfs has already decided to destroy the transport because its
> refcount droped to zero. It looks more like a refcount tracking issue,
> which is always done outside of event-poll code.

Here is rpc_transport_(ref|unref) tracking prior the bug:

1367153818.290418 rpcsvc_request_init 0x7f7ff73b6000  1 -> 2
1367153818.290426 server_alloc_frame 0x7f7ff73b6000  2 -> 3
1367153818.290449 rpcsvc_request_init 0x7f7ff73b6000  3 -> 4
1367153818.290455 server_alloc_frame 0x7f7ff73b6000  4 -> 5
1367153818.291074 rpcsvc_request_init 0x7f7ff73b6000  5 -> 6
1367153818.291076 server_alloc_frame 0x7f7ff73b6000  6 -> 7
1367153818.291100 rpcsvc_request_init 0x7f7ff73b6000  7 -> 8
1367153818.291103 server_alloc_frame 0x7f7ff73b6000  8 -> 9
1367153818.291104 rpcsvc_request_destroy 0x7f7ff73b6000  9 -> 8
1367153818.291104 free_state 0x7f7ff73b6000  8 -> 7
1367153818.291112 rpcsvc_request_init 0x7f7ff73b6000  7 -> 8
1367153818.291114 rpcsvc_request_destroy 0x7f7ff73b6000  8 -> 7
1367153818.291115 server_alloc_frame 0x7f7ff73b6000  7 -> 6
1367153818.291127 rpcsvc_request_destroy 0x7f7ff73b6000  6 -> 5
1367153818.291127 free_state 0x7f7ff73b6000  5 -> 4
1367153818.291987 rpcsvc_request_destroy 0x7f7ff73b6000  4 -> 3
1367153818.291987 free_state 0x7f7ff73b6000  3 -> 2
1367153818.293152 rpcsvc_request_destroy 0x7f7ff73b6000  2 -> 1
1367153818.293152 free_state 0x7f7ff73b6000  1 -> 0

This explains the reference dropping to 0, but it is odd, I just don't get
it, it makes no sense:
1367153818.291115 server_alloc_frame 0x7f7ff73b6000  7 -> 6

I double-checked my debug code and it seems correct. server_alloc_frame()
contains that call where I added the __func__ argument for debugging:
        state->xprt  = rpc_transport_ref (req->trans, __func__);

Here is my modified rpc_transport_ref():

rpc_transport_t * 
rpc_transport_ref (rpc_transport_t *this, char *func)
{
        rpc_transport_t *return_this = NULL;

        GF_VALIDATE_OR_GOTO("rpc_transport", this, fail);

        pthread_mutex_lock (&this->lock);
        {
                this->debug[this->debugcount].before = this->refcount;
                this->refcount ++;
                this->debug[this->debugcount].after = this->refcount;
                gettimeofday(&this->debug[this->debugcount].tv, NULL);
                this->debug[this->debugcount].func = func;
                this->debug[this->debugcount].trans = this;
                this->debugcount = (this->debugcount + 1) % DEBUG_SIZE;
        }
        pthread_mutex_unlock (&this->lock);

        return_this = this;
fail:
        return return_this;
}


And rpc_transport_destroy():
int32_t
rpc_transport_destroy (rpc_transport_t *this)
{
        int32_t ret = -1;

        GF_VALIDATE_OR_GOTO("rpc_transport", this, fail);

        {
            int i, c;

            for (i = 0; i < DEBUG_SIZE; i++) {
                c = (i + this->debugcount + 1) % DEBUG_SIZE;
                gf_log("XXXref", GF_LOG_ERROR, "%d.%06d %s %p  %d -> %d",
                       this->debug[c].tv.tv_sec, this->debug[c].tv.tv_usec,
                       this->debug[c].func, this->debug[c].trans,
                       this->debug[c].before, this->debug[c].after);
                }
        }

         /* ... */


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org




More information about the Gluster-devel mailing list