<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi,<br>
two days ago is started a gluster volume remove-brick on a
Distributed-Replicate volume with 21 x 2 per node (3 in total).<br>
<br>
I wanted to remove 4 bricks per node which are smaller than the
others (on each node I have 7 x 2TB disks and 4 x 500GB disks).<br>
I am still on gluster 3.5.2. and I was not aware that using disks of
different sizes is only supported as of 3.6.x (am I correct?)<br>
<br>
I started with 2 paired disks like so:<br>
gluster volume remove-brick VOLNAME node03:/export/brick8node03
node02:/export/brick10node02 start<br>
<br>
I followed the progress (which was very slow):<br>
gluster volume remove-brick volume_name node03:/export/brick8node03
node02:/export/brick10node02 status<br>
after a day the progress of node03:/export/brick8node03 showed
"completed", the other brick remained "in progress"<br>
<br>
this morning several VM's with vdi's on the volume started showing
disk errors + a couple of gluserfs mounts returned a disk is full
type of error on the volume which is only ca. 41% filled with data
currently.<br>
<br>
via df -h I saw that most of the 500GB disk where indeed 100% full.
Others were meanwhile nearly empty..<br>
Gluster seems to have gone nuts a bit during rebalancing the data.<br>
<br>
I did a:<br>
gluster volume remove-brick VOLNAME node03:/export/brick8node03
node02:/export/brick10node02 stop<br>
and a:<br>
gluster volume rebalance VOLNAME start<br>
<br>
progress is again very slow and some of the disks/bricks which were
ca. 98% are now 100% full.<br>
The situation seems to be both getting worse in some cases and
slowly improving e.g. for another pair of bricks (from 100% to 97%).<br>
<br>
There clearly has been some data corruption. Some VM's don't want to
boot anymore, throwing disk errors.<br>
<br>
How do I proceed?<br>
Wait a very long time for the rebalance to complete and hope that
the data corruption is automatically mended?<br>
<br>
Upgrade to 3.6.x and hope that the issues (which might be related to
me using bricks of different sizes) are resolved and again risk a
remove-brick operation?<br>
<br>
Should I rather do a:<br>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
gluster volume rebalance VOLNAME migrate-data start<br>
<br>
Should I have done a
<meta http-equiv="content-type" content="text/html; charset=utf-8">
replace-brick instead of a remove-brick operation originally? I
thought that replace-brick is becoming obsolete.<br>
<br>
Thanks,<br>
Olav<br>
<br>
<br>
<br>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>