[Gluster-users] Unable to rebalance...status or stop after upgrade to 3.3

Wed Aug 8 19:31:11 UTC 2012

This sounds similar, tho not identical to a problem that I had recently
(descriibed here:
<http://gluster.org/pipermail/gluster-users/2012-August/011054.html>
My problems resulted were teh result of starting this kind of rebalance
with a server node appearing to be connected (via the 'gluster peer status'
output, but not  actually being connected as shown by the
'gluster volume status all detail' output.  Note especially the part that
describes its online state.

------------------------------------------------------------------------------
Brick                : Brick pbs3ib:/bducgl
Port                 : 24018
Online               : N                   <<=====================
Pid                  : 20953
File System          : xfs

You may have already verified this, but what I did was to start a rebalance
/ fix-layout with a disconnected brick and it went ahead and tried to do
it, unsuccessfully as you might guess..  But when I finally was able to
reconnect the downed brick, and restart the rebalance, it (astonishingly)
was able to bring everything back.  So props to the gluster team.

hjm

On Wed, Aug 8, 2012 at 11:58 AM, Dan Bretherton <
d.a.bretherton at reading.ac.uk> wrote:

> Hello All-
> I have noticed another problem after upgrading to version 3.3.  I am
> unable to do "gluster volume rebalance <VOLUME> fix-layout status" or
> "...fix-layout ... stop" after starting a rebalance operation with "gluster
> volume rebalance <VOLUME> fix-layout start".   The fix-layout operation
> seemed to be progressing normally on all the servers according to the log
> files, but all attempts to do "status" or "stop" result in the CLI usage
> message being returned.  The only reference to the rebalance commands in
> the log files were these, which all the servers seem to have one or more of.
>
> [root at romulus glusterfs]# grep rebalance *.log
> etc-glusterfs-glusterd.vol.**log:[2012-08-08 12:49:04.870709] W
> [socket.c:1512:__socket_proto_**state_machine] 0-management: reading from
> socket failed. Error (Transport endpoint is not connected), peer
> (/var/lib/glusterd/vols/**tracks/rebalance/cb21050d-**
> 05c2-42b3-8660-230954bab324.**sock)
> tracks-rebalance.log:[2012-08-**06 10:41:18.550241] I
> [graph.c:241:gf_add_cmdline_**options] 0-tracks-dht: adding option
> 'rebalance-cmd' for volume 'tracks-dht' with value '4'
>
> The volume name is "tracks" by the way.  I wanted to stop the rebalance
> operation because it seemed to be causing a very high load on some of the
> servers had been running for several days.  I ended up having to manually
> kill the rebalance processes on all the servers followed by restarting
> glusterd.
>
> After that I found that one of the servers had "rebalance_status=4" in
> file /var/lib/glusterd/vols/tracks/**node_state.info<http://node_state.info>,
> whereas all the others had "rebalance_status=0".  I manually changed the
> '4' to '0' and restarted glusterd.  I don't know if this was a consequence
> of the way I had killed the rebalance operation or the cause of the strange
> behaviour.  I don't really want to start another rebalance going to test
> because the last one was so disruptive.
>
> Has anyone else experienced this problem since upgrading to 3.3?
>
> Regards,
> Dan.
>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120808/01995bbc/attachment.html>