[Gluster-users] Unable to rebalance...status or stop after upgrade to 3.3
Dan Bretherton
d.a.bretherton at reading.ac.uk
Mon Aug 13 11:20:41 UTC 2012
Harry-
Thanks for the tip. My problem could well have been the same as yours.
I have known for some time that "gluster peer status" doesn't give
useful connection information but I didn't know about the "gluster
volume status" commands; they must be new in version 3.3. I usually
discover connection problems by seeing phrases like "disconnected" and
"anomalies" in the logs. This has been happening more often since I
upgraded to version 3.3, and I suspect it is being caused by the very
high load experienced by some servers. I have seen this load problem
discussed in other threads. The next time I attempt a rebalance
operation I will run "gluster volume status all detail" first to check
connectivity.
-Dan
On 08/08/2012 08:31 PM, Harry Mangalam wrote:
> This sounds similar, tho not identical to a problem that I had
> recently (descriibed here:
> <http://gluster.org/pipermail/gluster-users/2012-August/011054.html>
> My problems resulted were teh result of starting this kind of
> rebalance with a server node appearing to be connected (via the
> 'gluster peer status' output, but not actually being connected as
> shown by the
> 'gluster volume status all detail' output. Note especially the part
> that describes its online state.
>
> ------------------------------------------------------------------------------
> Brick : Brick pbs3ib:/bducgl
> Port : 24018
> Online : N <<=====================
> Pid : 20953
> File System : xfs
>
>
> You may have already verified this, but what I did was to start a
> rebalance / fix-layout with a disconnected brick and it went ahead and
> tried to do it, unsuccessfully as you might guess.. But when I
> finally was able to reconnect the downed brick, and restart the
> rebalance, it (astonishingly) was able to bring everything back. So
> props to the gluster team.
>
> hjm
>
>
> On Wed, Aug 8, 2012 at 11:58 AM, Dan Bretherton
> <d.a.bretherton at reading.ac.uk <mailto:d.a.bretherton at reading.ac.uk>>
> wrote:
>
> Hello All-
> I have noticed another problem after upgrading to version 3.3. I
> am unable to do "gluster volume rebalance <VOLUME> fix-layout
> status" or "...fix-layout ... stop" after starting a rebalance
> operation with "gluster volume rebalance <VOLUME> fix-layout
> start". The fix-layout operation seemed to be progressing
> normally on all the servers according to the log files, but all
> attempts to do "status" or "stop" result in the CLI usage message
> being returned. The only reference to the rebalance commands in
> the log files were these, which all the servers seem to have one
> or more of.
>
> [root at romulus glusterfs]# grep rebalance *.log
> etc-glusterfs-glusterd.vol.log:[2012-08-08 12:49:04.870709] W
> [socket.c:1512:__socket_proto_state_machine] 0-management: reading
> from socket failed. Error (Transport endpoint is not connected),
> peer
> (/var/lib/glusterd/vols/tracks/rebalance/cb21050d-05c2-42b3-8660-230954bab324.sock)
> tracks-rebalance.log:[2012-08-06 10:41:18.550241] I
> [graph.c:241:gf_add_cmdline_options] 0-tracks-dht: adding option
> 'rebalance-cmd' for volume 'tracks-dht' with value '4'
>
> The volume name is "tracks" by the way. I wanted to stop the
> rebalance operation because it seemed to be causing a very high
> load on some of the servers had been running for several days. I
> ended up having to manually kill the rebalance processes on all
> the servers followed by restarting glusterd.
>
> After that I found that one of the servers had
> "rebalance_status=4" in file
> /var/lib/glusterd/vols/tracks/node_state.info
> <http://node_state.info>, whereas all the others had
> "rebalance_status=0". I manually changed the '4' to '0' and
> restarted glusterd. I don't know if this was a consequence of the
> way I had killed the rebalance operation or the cause of the
> strange behaviour. I don't really want to start another rebalance
> going to test because the last one was so disruptive.
>
> Has anyone else experienced this problem since upgrading to 3.3?
>
> Regards,
> Dan.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
>
>
> --
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120813/83e41133/attachment.html>
More information about the Gluster-users
mailing list