[Gluster-devel] How does GD_SYNCOP work?

Thu Sep 11 18:03:56 UTC 2014

Krishnan Parthasarathi <kparthas at redhat.com> wrote:

> The scheduling of a paused task happens when the epoll thread receives a
> POLLIN event along with the response from the remote endpoint. This is
> contingent on the fact that the call back must issue a synctask_wake,
> which will trigger the resumption of the task (in one of the threads from
> the syncenv). In summary, the call back code triggers the scheduling back
> of the paused task.

Right, this seems to work. I found the __wake() call at the end of
_gd_syncop_brick_op_cbk() and it is executed.  The problem is therefore
not there.

I tried running the test setps one by one. The offending command is
"gluster volume heal $V0 info", hence I run it between each step.

It works at the beginning, it works if I kill 3 out ouf 6 bricks, and it
hangs after I created files in the volume (with 3 out of 6 bricks down).

And at that time, the bricks that are still up show this in the logs:

[2014-09-11 17:47:31.452067] I [server.c:518:server_rpc_notify]
0-patchy-server: disconnecting connection from
netbsd0.cloud.gluster.org-24431-2014/09/11-17:40:47:719843-patchy-client
-1-0-0
[2014-09-11 17:47:31.452142] I [server-helpers.c:290:do_fd_cleanup]
0-patchy-server: fd cleanup on /a/a/a/a/a/a/a/a/a/a
[2014-09-11 17:47:31.452689] I [client_t.c:417:gf_client_unref]
0-patchy-server: Shutting down connection
netbsd0.cloud.gluster.org-24431-2014/09/11-17:40:47:719843-patchy-client
-1-0-0
[2014-09-11 17:47:31.455145] I [server.c:518:server_rpc_notify]
0-patchy-server: disconnecting connection from
netbsd0.cloud.gluster.org-3612-2014/09/11-17:40:28:979958-patchy-client-
1-0-0
[2014-09-11 17:47:31.455172] I [client_t.c:417:gf_client_unref]
0-patchy-server: Shutting down connection
netbsd0.cloud.gluster.org-3612-2014/09/11-17:40:28:979958-patchy-client-
1-0-0
[2014-09-11 17:47:31.455208] I [server.c:518:server_rpc_notify]
0-patchy-server: disconnecting connection from
netbsd0.cloud.gluster.org-26218-2014/09/11-17:40:28:900316-patchy-client
-1-0-0
[2014-09-11 17:47:31.455230] I [client_t.c:417:gf_client_unref]
0-patchy-server: Shutting down connection
netbsd0.cloud.gluster.org-26218-2014/09/11-17:40:28:900316-patchy-client
-1-0-0

If I understood correctly, gluster volume heal info causes glusterd to
send requests to bricks that are alive. If they go offline at that time
it may explain why the command hangs. What is the correct behavior here?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org