[Gluster-users] mixing rdma/tcp bricks, rebalance operation locked up
Harry Mangalam
harry.mangalam at uci.edu
Tue Dec 13 19:05:52 UTC 2011
I built an rdma-based volume out of 5 bricks:
$ gluster volume info
Volume Name: glrdma
Type: Distribute
Status: Started
Number of Bricks: 6
Transport-type: rdma
Bricks:
Brick1: pbs1:/data2
Brick2: pbs2:/data2
Brick3: pbs3:/data2
Brick4: pbs3:/data
Brick5: pbs4:/data
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
and everything was working well. I then tried to add a TCP/socket
brick to it, thinking that it would be refused, but gluster happily
added it:
$ gluster volume info
Volume Name: glrdma
Type: Distribute
Status: Started
Number of Bricks: 6
Transport-type: rdma
Bricks:
Brick1: pbs1:/data2
Brick2: pbs2:/data2
Brick3: pbs3:/data2
Brick4: pbs3:/data
Brick5: pbs4:/data
Brick6: dabrick:/data2 <-- TCP/socket brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
However, not too suprisingly, there are problems when I tried to
rebalance the added brick. It allowed me to start a rebalance/fix-
layout, but it never ended and the logs continue to contain the
following reports of 'connection refused' (see at bottom).
Attempts to remove the TCP brick are unsuccessful, even after stopping
the volume:
$ gluster volume stop glrdma
Stopping volume will make its data inaccessible. Do you want to
continue? (y/n) y
Stopping volume glrdma has been successful
$ gluster volume remove-brick glrdma dabrick:/data2
Removing brick(s) can result in data loss. Do you want to Continue?
(y/n) y
Remove Brick unsuccessful
(more errors citing missing 'option transport-type'. defaulting to
"socket":
[2011-12-13 10:34:57.241676] I [cli-rpc-
ops.c:1073:gf_cli3_1_remove_brick_cbk] 0-cli: Received resp to remove
brick
[2011-12-13 10:34:57.241852] I [input.c:46:cli_batch] 0-: Exiting
with: -1
[2011-12-13 10:46:08.937294] W [rpc-
transport.c:606:rpc_transport_load] 0-rpc-transport: missing 'option
transport-type'. defaulting to "socket"
[2011-12-13 10:46:09.110636] I [cli-rpc-
ops.c:417:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2011-12-13 10:46:09.110845] I [cli-rpc-
ops.c:596:gf_cli3_1_get_volume_cbk] 0-: Returning: 0
[2011-12-13 10:46:09.111038] I [cli-rpc-
ops.c:417:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2011-12-13 10:46:09.111070] I [cli-rpc-
ops.c:596:gf_cli3_1_get_volume_cbk] 0-: Returning: 0
[2011-12-13 10:46:09.111080] I [input.c:46:cli_batch] 0-: Exiting
with: 0
[2011-12-13 10:52:18.142283] W [rpc-
transport.c:606:rpc_transport_load] 0-rpc-transport: missing 'option
transport-type'. defaulting to "socket"
And the rebalance operations now seem to be locked up, since the
response to rebalance is nonsensical: (the commands were given
serially, with no other intervening commands)
$ gluster volume rebalance glrdma fix-layout start
Rebalance on glrdma is already started
$ gluster volume rebalance glrdma fix-layout status
rebalance stopped
$ gluster volume rebalance glrdma fix-layout stop
stopped rebalance process of volume glrdma
(after rebalancing 0 files totaling 0 bytes)
$ gluster volume rebalance glrdma fix-layout start
Rebalance on glrdma is already started
Is there a way to back out of this situation? Or has incorrectly
adding the TCP brick permanently hosed the volume?
And does this imply a bug in the add-brick routine? (hopefully fixed?)
Logs extracts
--------------
tc-glusterd-mount-glrdma.log (and nfs.log, even tho I haven't tried to
export it via nfs) has zillions of these lines:
[2011-12-13 10:36:11.702130] E [rdma.c:4417:tcp_connect_finish] 0-
glrdma-client-5: tcp connect to failed (Connection refused)
cli.log has many of these lines:
[2011-12-13 10:34:55.142428] W [rpc-
transport.c:606:rpc_transport_load] 0-rpc-transport: missing 'option
transport-type'. defaulting to "socket"
--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
This signature has been OCCUPIED!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20111213/87728278/attachment.html>
More information about the Gluster-users
mailing list