[Gluster-users] Unable to rebalance...status or stop after upgrade to 3.3

Dan Bretherton d.a.bretherton at reading.ac.uk
Mon Aug 13 11:20:41 UTC 2012


Harry-
Thanks for the tip.  My problem could well have been the same as yours.  
I have known for some time that "gluster peer status" doesn't give 
useful connection information but I didn't know about the "gluster 
volume status" commands; they must be new in version 3.3.  I usually 
discover connection problems by seeing phrases like "disconnected" and 
"anomalies" in the logs.  This has been happening more often since I 
upgraded to version 3.3, and I suspect it is being caused by the very 
high load experienced by some servers.  I have seen this load problem 
discussed in other threads.  The next time I attempt a rebalance 
operation I will run "gluster volume status all detail" first to check 
connectivity.

-Dan

On 08/08/2012 08:31 PM, Harry Mangalam wrote:
> This sounds similar, tho not identical to a problem that I had 
> recently (descriibed here:
> <http://gluster.org/pipermail/gluster-users/2012-August/011054.html>
> My problems resulted were teh result of starting this kind of 
> rebalance with a server node appearing to be connected (via the 
> 'gluster peer status' output, but not  actually being connected as 
> shown by the
> 'gluster volume status all detail' output.  Note especially the part 
> that describes its online state.
>
> ------------------------------------------------------------------------------
> Brick                : Brick pbs3ib:/bducgl
> Port                 : 24018
> Online               : N <<=====================
> Pid                  : 20953
> File System          : xfs
>
>
> You may have already verified this, but what I did was to start a 
> rebalance / fix-layout with a disconnected brick and it went ahead and 
> tried to do it, unsuccessfully as you might guess..  But when I 
> finally was able to reconnect the downed brick, and restart the 
> rebalance, it (astonishingly) was able to bring everything back.  So 
> props to the gluster team.
>
> hjm
>
>
> On Wed, Aug 8, 2012 at 11:58 AM, Dan Bretherton 
> <d.a.bretherton at reading.ac.uk <mailto:d.a.bretherton at reading.ac.uk>> 
> wrote:
>
>     Hello All-
>     I have noticed another problem after upgrading to version 3.3.  I
>     am unable to do "gluster volume rebalance <VOLUME> fix-layout
>     status" or "...fix-layout ... stop" after starting a rebalance
>     operation with "gluster volume rebalance <VOLUME> fix-layout
>     start".   The fix-layout operation seemed to be progressing
>     normally on all the servers according to the log files, but all
>     attempts to do "status" or "stop" result in the CLI usage message
>     being returned.  The only reference to the rebalance commands in
>     the log files were these, which all the servers seem to have one
>     or more of.
>
>     [root at romulus glusterfs]# grep rebalance *.log
>     etc-glusterfs-glusterd.vol.log:[2012-08-08 12:49:04.870709] W
>     [socket.c:1512:__socket_proto_state_machine] 0-management: reading
>     from socket failed. Error (Transport endpoint is not connected),
>     peer
>     (/var/lib/glusterd/vols/tracks/rebalance/cb21050d-05c2-42b3-8660-230954bab324.sock)
>     tracks-rebalance.log:[2012-08-06 10:41:18.550241] I
>     [graph.c:241:gf_add_cmdline_options] 0-tracks-dht: adding option
>     'rebalance-cmd' for volume 'tracks-dht' with value '4'
>
>     The volume name is "tracks" by the way.  I wanted to stop the
>     rebalance operation because it seemed to be causing a very high
>     load on some of the servers had been running for several days.  I
>     ended up having to manually kill the rebalance processes on all
>     the servers followed by restarting glusterd.
>
>     After that I found that one of the servers had
>     "rebalance_status=4" in file
>     /var/lib/glusterd/vols/tracks/node_state.info
>     <http://node_state.info>, whereas all the others had
>     "rebalance_status=0".  I manually changed the '4' to '0' and
>     restarted glusterd.  I don't know if this was a consequence of the
>     way I had killed the rebalance operation or the cause of the
>     strange behaviour.  I don't really want to start another rebalance
>     going to test because the last one was so disruptive.
>
>     Has anyone else experienced this problem since upgrading to 3.3?
>
>     Regards,
>     Dan.
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
>
>
> -- 
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120813/83e41133/attachment.html>


More information about the Gluster-users mailing list