[Gluster-users] Non-progressing, Unstoppable rebalance on 3.3
Harry Mangalam
hjmangalam at gmail.com
Thu Aug 23 18:23:02 UTC 2012
Following an interchange with Jeff Darcy and Shishir Gowda, I started
a rebalance of my cluster (3.3 on Ubuntu 10.04.4).
Note: shortly after it started, 3/4 of the glusterfsd's shut down
(which was exciting..). I stopped and restarted glusterd and the
glusterfsd's restarted in turn and all was well, however it may have
caused a problem with the rebalance:
After 2 days of waiting, the rebalance has apparently done nothing
(distracted by other things) and presents with the same values as it
had originally:
Thu Aug 23 10:35:11 [0.00 0.00 0.00] root at pbs1:/var/log/glusterfs
770 $ gluster volume rebalance gli status
Node Rebalanced-files size scanned failures
status
--------- ----------- ----------- ----------- -----------
------------
localhost 0 0 0 0 in progress
pbs4ib 0 0 0 0 not started
pbs2ib 1380 547324969 7686 3 completed
pbs3ib 0 0 0 0 not started
(the above has the leading 32 blanks trimmed from the output - is
there a reason for including those in the output?)
the above implies that it is at least partially "in progress", but
after stopping it:
Thu Aug 23 10:53:26 [0.00 0.00 0.00] root at pbs1:/var/log/glusterfs
774 $ gluster volume rebalance gli stop
Node Rebalanced-files size scanned failures
status
--------- ----------- ----------- ----------- -----------
------------
localhost 0 0 0 0 in progress
pbs4ib 0 0 0 0 not started
pbs2ib 1380 547324969 7686 3 completed
pbs3ib 0 0 0 0 not started
Stopped rebalance process on volume gli
it still seems to be going:
Thu Aug 23 10:53:28 [0.00 0.00 0.00] root at pbs1:/var/log/glusterfs
775 $ gluster volume rebalance gli status
Node Rebalanced-files size scanned failures
status
--------- ----------- ----------- ----------- -----------
------------
localhost 0 0 0 0 in progress
pbs4ib 0 0 0 0 not started
pbs2ib 1380 547324969 7686 3 completed
pbs3ib 0 0 0 0 not started
Examining the server nodes, only pbs1 (localhost in the above output)
had glusterfs running, and since it may have been 'orphaned' when I
had the glusterfsd hiccups and has been hanging since that time.
However, when I killed it, nothing changes. gluster still reports
that the rebalance is in progress (even tho no glusterfs's are running
on any of the nodes).
If I try to reset it with a 'start force':
Thu Aug 23 11:14:39 [0.06 0.04 0.00] root at pbs1:/var/log/glusterfs
789 $ gluster volume rebalance gli start force
Rebalance on gli is already started
and the status remains exactly as above.
>From the clients POV, all seems to be fine, but I've got a hanging
rebalance that is both annoying and worrying.
Is there a way to reset this smoothly, or dies it require a server restart?
hjm
--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
More information about the Gluster-users
mailing list