[Bugs] [Bug 1288995] [tiering]: Tier daemon crashed on two of eight nodes and lot of "demotion failed" seen in the system

bugzilla at redhat.com bugzilla at redhat.com
Mon Dec 7 08:14:42 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1288995



--- Comment #2 from Nithya Balachandran <nbalacha at redhat.com> ---
Analysis of dhcp37-121.core.5159:


>From the logs:
[2015-12-03 00:01:25.433174] C
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
0-tiering-test-vol-01-client-9: server 10.70.37.121:49153 has not responded in
the last 42 seconds, disconnecting.
[2015-12-03 00:01:25.433755] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fdf009026eb] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fdf006cd227] (-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdf006cd33e] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fdf006cd40b] (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1c2)[0x7fdf006cda42] )))))
0-tiering-test-vol-01-client-9: forced unwinding frame type(GlusterFS 3.3)
op(IPC(47)) called at 2015-12-03 00:00:01.130434 (xid=0xcc177)
[2015-12-03 00:01:25.433787] W [MSGID: 114031]
[client-rpc-fops.c:2265:client3_3_ipc_cbk] 0-tiering-test-vol-01-client-9:
remote operation failed [Transport endpoint is not connected]
[2015-12-03 00:01:25.433966] E [MSGID: 109107]
[tier.c:838:tier_process_ctr_query] 0-tiering-test-vol-01-tier-dht: Failed
query on /rhs/brick2/leg1/.glusterfs/leg1.db ret -107
pending frames:



This means that the syncop_ipc call in tier_process_ctr_query failed.



        ret = dict_set_bin (ctr_ipc_in_dict, GFDB_IPC_CTR_GET_QUERY_PARAMS,
                                ipc_ctr_params, sizeof (*ipc_ctr_params));
        if (ret) {
                gf_msg (this->name, GF_LOG_ERROR, 0, LG_MSG_SET_PARAM_FAILED,
                        "Failed setting %s to params dictionary",
                        GFDB_IPC_CTR_GET_QUERYsyncop_ipc_PARAMS);
                goto out;
        }

        ret = syncop_ipc (local_brick->xlator, GF_IPC_TARGET_CTR,
                                ctr_ipc_in_dict, &ctr_ipc_out_dict);
        if (ret) {
                gf_msg (this->name, GF_LOG_ERROR, 0,
                        DHT_MSG_LOG_IPC_TIER_ERROR, "Failed query on %s ret
%d",
                        local_brick->brick_db_path, ret);
                goto out;
        }

Since the call to syncop_ipc() failed, ctr_ipc_out_dict is NULL. On goto out:


out:

        if (ctr_ipc_in_dict) {
                dict_unref(ctr_ipc_in_dict); <-- this will free ipc_ctr_params
                ctr_ipc_in_dict = NULL;
        }

        if (ctr_ipc_out_dict) {  
                dict_unref(ctr_ipc_out_dict);
                ctr_ipc_out_dict = NULL;
                ipc_ctr_params = NULL; <-- this is not set to NULL
        }

        GF_FREE (ipc_ctr_params);  <--double free

        return ret;
}


The dict_unref(ctr_ipc_in_dict) will call GF_FREE on ipc_ctr_params as part of
dict_destroy()->data_unref() as data->is_static is false.

As the memory has already been freed, the second call to GF_FREE
(ipc_ctr_params) will crash.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.


More information about the Bugs mailing list