[Gluster-devel] How does GD_SYNCOP work?
kparthas at redhat.com
Wed Sep 10 09:32:41 UTC 2014
I am not sure why glustershd process is not replying to the 'brick op' RPC sent from glusterd.
That is something that we need to identify.
Let me try to explain how GD_SYNCOP works. Internally, GD_SYNCOP yields the thread that was
executing the (sync)task once the RPC request is submitted (asynchronously) to the remote endpoint.
It's equivalent to pausing the task until the response is received. The call back function, which generally
executes in the epoll thread, wakes the corresponding task into execution (ie. resumes task execution).
If the remote endpoint doesn't reply for longer than frame-timeout, which is default 10mins in glusterd,
the call back is invoked (in the timer thread), which would call the wake and resume the task to completion,
albeit with failure.
Hope that helps.
----- Original Message -----
> I am tracking a bug that appear when running self_heald.t on NetBSD.
> The test will hang on:
> EXPECT "$HEAL_FILES" afr_get_pending_heal_count $V0
> The problem inside afr_get_pending_heal_count is when calling
> gluster volume heal $vol info
> The command will never return. By adding a lot of printf, I
> tracked down the problem to GD_SYNCOP() when called throigh
> In GD_SYNCOP(), once gd_syncop_submit_request() is called
> with success, we call synctask_yield() to wait for the
> reply. It will never come: _gd_syncop_brick_op_cbk() is not called.
> I suspect this is a synctask_wake() problem somewhere. If I
> add synctask_wake() before synctask_yiel() in GD_SYNCOP(),
> the currrent task is scheduled immediatly, gd_syncop_mgmt_brick_op()
> exits, then later _gd_syncop_brick_op_cbk() is invoked. Of course
> it will crash, because the context (args) was allocated on the
> stack in gd_syncop_mgmt_brick_op(),
> Anyone has an idea of what is going on?
> Emmanuel Dreyfus
> manu at netbsd.org
> Gluster-devel mailing list
> Gluster-devel at gluster.org
More information about the Gluster-devel