[Gluster-devel] How does GD_SYNCOP work?
Emmanuel Dreyfus
manu at netbsd.org
Fri Sep 12 04:31:55 UTC 2014
Krishnan Parthasarathi <kparthas at redhat.com> wrote:
> If you left the hung setup for over ten minutes from the time the bricks
> went down, you should see logs corresponding to one of the above two
> mechanisms in action. Let me know if you don't. Then we need to
> investigate further.
Yes, it is able to miserably die after 10 minutes :-)
I added some debug printf to see what was hanging. Here is the path of
glusterd when receiving the gluster volume heal info
gd_brick_op_phase
glusterd_volinfo_find
glusterd_bricks_select_heal_volume -> rxlator_count = 3
glusterd_syncop_aggr_rsp_dict
list_for_each_entry (pending_node, &selected, list) {
First in list is rpc->conn.name = "management"
gd_syncop_mgmt_brick_op
glusterd_brick_op_build_payload
GD_SYNCOP -> never resume
}
It is fine for me that glusterd_bricks_select_heal_volume() finds 3
bricks, they are the 3 remaining alive bricks. However I am surprised to
see the first in the list having rpc->conn.name = "management". It
should be a brick name here, right? Or is this glustershd?
The logs give a hint about GD_SYNCOP not returning:
[2014-09-12 04:19:35.266126] I [socket.c:3277:socket_submit_reply]
0-socket.management: not connected (priv->connected = -1)
[2014-09-12 04:19:35.266139] E [rpcsvc.c:1249:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc
cli, ProgVers: 2, Proc: 31) to rpc-transport (socket.management)
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org
More information about the Gluster-devel
mailing list