[Gluster-users] 'remove-brick' is removing more bytes than are in the brick(?)
Harry Mangalam
harry.mangalam at uci.edu
Mon May 21 21:41:09 UTC 2012
I'm running 3.3b3 on a 5brick/Ubuntu 10.04.4 system with mixed
IPoIB/GbE. It's behaving well other than the current problem. The
gluster filesystem is live and being used lightly by our cluster.
Note that the gli volume has 2 bricks on pbs2ib. I'm trying to clear
the smaller brick in preparation to replace the disks with larger
ones.
=====
root at pbs1:/var/log/glusterfs# gluster volume info
Volume Name: gli
Type: Distribute
Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4
Status: Started
Number of Bricks: 5
Transport-type: tcp,rdma
Bricks:
Brick1: pbs1ib:/bducgl
Brick2: pbs2ib:/bducgl <--- to remain
Brick3: pbs2ib:/bducgl1 <--- to be removed
Brick4: pbs3ib:/bducgl
Brick5: pbs4ib:/bducgl
Options Reconfigured:
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
=====
'df' reports the brick (on /bducgl1) has 1265060072 KB:
=====
root at pbs2:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 28959736 15104536 12384124 55% /
..
/dev/md0 8788707776 1524178616 7264529160 18% /bducgl
/dev/sda 1952129740 1265060072 687069668 65% /bducgl1
^^^^^^^^^^
=====
(incidentally, this number '1265060072' does not change when as files
are being removed, even as the files are being removed - ie files that
the log say are being removed are no longer visible on the brick
filesystem - is this expected?)
However the remove-brick operation has been going for about a day and
reports having moved 1,369,285,939,442 bytes
=====
root at pbs1:
# gluster volume remove-brick gli pbs2ib:/bducgl1 status
Node Rebalanced-files size scanned status
--------- ----------- ----------- ----------- ------------
localhost 90 189616 87639 not started
pbs4ib 0 0 0 not started
pbs3ib 0 0 0 not started
pbs2ib 8617041369285939442 2941430 in progress
^^^^^^^^^^^^^
=====
This is more than even 1265060072*1024=1.29542151373e+12 bytes, so I'm
wondering when/if this process is going to end..?
If I examine the 'gli-rebalance.log', I am still getting log entries
like this (at about 1/sec - I would have expected considerably faster)
[2012-05-21 14:27:31.629995] I [dht-rebalance.c:854:dht_migrate_file]
0-gli-dht: completed migration of
/alamng/Research/Scheraga/F8i1m/set15/Fig8_int1_template_Hamil_set15.dat
from subvolume gli-client-2 to gli-client-1
so migration appears to be happening and at repeated 'status' updates,
the numbers change, but why is the gluster byte info so different from
the 'df' info?
And is there any way to get an idea of when the process will end? The
'scanned' column number also is increasing so it's obviously not the
total number of files to be moved.
After writing most of this note, this is the status about 10m later:
# gluster volume remove-brick gli pbs2ib:/bducgl1 status
Node Rebalanced-files size scanned status
--------- ----------- ----------- ----------- ---------
localhost 90 189616 87639 not started
pbs4ib 0 0 0 not started
pbs3ib 0 0 0 not started
pbs2ib 8780091379699182236 2994733 in progress
--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120521/6e7a1f3f/attachment.html>
More information about the Gluster-users
mailing list