[Gluster-users] problems after gluster volume remove-brick
Olav Peeters
opeeters at gmail.com
Wed Jan 21 14:33:09 UTC 2015
Adding to my previous mail..
I find a couple of strange errors in the rebalance log
(/var/log/glusterfs/sr_vol01-rebalance.log)
e.g.:
[2015-01-21 10:00:32.123999] E
[afr-self-heal-entry.c:1135:afr_sh_entry_impunge_newfile_cbk]
0-sr_vol01-replicate-11: creation of /some/file/on/the/volume.data on
sr_vol01-client-23 failed (No space left on device)
Why is the rebalance seemingly not taking account of the space left on
disks available.
This is the current situation on this particular node:
[root at gluster03 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
50G 2.4G 45G 5% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/sda1 485M 95M 365M 21% /boot
/dev/sdb1 1.9T 577G 1.3T 31% /export/brick1gfs03
/dev/sdc1 1.9T 154G 1.7T 9% /export/brick2gfs03
/dev/sdd1 1.9T 413G 1.5T 23% /export/brick3gfs03
/dev/sde1 1.9T 1.5T 417G 78% /export/brick4gfs03
/dev/sdf1 1.9T 1.6T 286G 85% /export/brick5gfs03
/dev/sdg1 1.9T 1.4T 443G 77% /export/brick6gfs03
/dev/sdh1 1.9T 33M 1.9T 1% /export/brick7gfs03
/dev/sdi1 466G 62G 405G 14% /export/brick8gfs03
/dev/sdj1 466G 166G 301G 36% /export/brick9gfs03
/dev/sdk1 466G 466G 20K 100% /export/brick10gfs03
/dev/sdl1 466G 450G 16G 97% /export/brick11gfs03
/dev/sdm1 1.9T 206G 1.7T 12% /export/brick12gfs03
/dev/sdn1 1.9T 306G 1.6T 17% /export/brick13gfs03
/dev/sdo1 1.9T 107G 1.8T 6% /export/brick14gfs03
/dev/sdp1 1.9T 252G 1.6T 14% /export/brick15gfs03
why are brick10 and brick11 over utilised when there is plenty of space
on brick 6, 14, etc. ?
Anyone any idea?
Cheers,
Olav
On 21/01/15 13:18, Olav Peeters wrote:
> Hi,
> two days ago is started a gluster volume remove-brick on a
> Distributed-Replicate volume with 21 x 2 per node (3 in total).
>
> I wanted to remove 4 bricks per node which are smaller than the others
> (on each node I have 7 x 2TB disks and 4 x 500GB disks).
> I am still on gluster 3.5.2. and I was not aware that using disks of
> different sizes is only supported as of 3.6.x (am I correct?)
>
> I started with 2 paired disks like so:
> gluster volume remove-brick VOLNAME node03:/export/brick8node03
> node02:/export/brick10node02 start
>
> I followed the progress (which was very slow):
> gluster volume remove-brick volume_name node03:/export/brick8node03
> node02:/export/brick10node02 status
> after a day the progress of node03:/export/brick8node03 showed
> "completed", the other brick remained "in progress"
>
> this morning several VM's with vdi's on the volume started showing
> disk errors + a couple of gluserfs mounts returned a disk is full type
> of error on the volume which is only ca. 41% filled with data currently.
>
> via df -h I saw that most of the 500GB disk where indeed 100% full.
> Others were meanwhile nearly empty..
> Gluster seems to have gone nuts a bit during rebalancing the data.
>
> I did a:
> gluster volume remove-brick VOLNAME node03:/export/brick8node03
> node02:/export/brick10node02 stop
> and a:
> gluster volume rebalance VOLNAME start
>
> progress is again very slow and some of the disks/bricks which were
> ca. 98% are now 100% full.
> The situation seems to be both getting worse in some cases and slowly
> improving e.g. for another pair of bricks (from 100% to 97%).
>
> There clearly has been some data corruption. Some VM's don't want to
> boot anymore, throwing disk errors.
>
> How do I proceed?
> Wait a very long time for the rebalance to complete and hope that the
> data corruption is automatically mended?
>
> Upgrade to 3.6.x and hope that the issues (which might be related to
> me using bricks of different sizes) are resolved and again risk a
> remove-brick operation?
>
> Should I rather do a:
> gluster volume rebalance VOLNAME migrate-data start
>
> Should I have done a replace-brick instead of a remove-brick operation
> originally? I thought that replace-brick is becoming obsolete.
>
> Thanks,
> Olav
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150121/698b7cba/attachment.html>
More information about the Gluster-users
mailing list