<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi,<br>

    two days ago is started a gluster volume remove-brick on a

    Distributed-Replicate volume with 21 x 2 per node (3 in total).<br>

    <br>

    I wanted to remove 4 bricks per node which are smaller than the

    others (on each node I have 7 x 2TB disks and 4 x 500GB disks).<br>

    I am still on gluster 3.5.2. and I was not aware that using disks of

    different sizes is only supported as of 3.6.x (am I correct?)<br>

    <br>

    I started with 2 paired disks like so:<br>

    gluster volume remove-brick VOLNAME node03:/export/brick8node03

    node02:/export/brick10node02 start<br>

    <br>

    I followed the progress (which was very slow):<br>

    gluster volume remove-brick volume_name node03:/export/brick8node03

    node02:/export/brick10node02 status<br>

    after a day the progress of node03:/export/brick8node03 showed

    "completed", the other brick remained "in progress"<br>

    <br>

    this morning several VM's with vdi's on the volume started showing

    disk errors + a couple of gluserfs mounts returned a disk is full

    type of error on the volume which is only ca. 41% filled with data

    currently.<br>

    <br>

    via df -h I saw that most of the 500GB disk where indeed 100% full.

    Others were meanwhile nearly empty..<br>

    Gluster seems to have gone nuts a bit during rebalancing the data.<br>

    <br>

    I did a:<br>

    gluster volume remove-brick VOLNAME node03:/export/brick8node03

    node02:/export/brick10node02 stop<br>

    and a:<br>

    gluster volume rebalance VOLNAME start<br>

    <br>

    progress is again very slow and some of the disks/bricks which were

    ca. 98% are now 100% full.<br>

    The situation seems to be both getting worse in some cases and

    slowly improving e.g. for another pair of bricks (from 100% to 97%).<br>

    <br>

    There clearly has been some data corruption. Some VM's don't want to

    boot anymore, throwing disk errors.<br>

    <br>

    How do I proceed?<br>

    Wait a very long time for the rebalance to complete and hope that

    the data corruption is automatically mended?<br>

    <br>

    Upgrade to 3.6.x and hope that the issues (which might be related to

    me using bricks of different sizes) are resolved and again risk a

    remove-brick operation?<br>

    <br>

    Should I rather do a:<br>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

    gluster volume rebalance VOLNAME migrate-data start<br>

    <br>

    Should I have done a

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

    replace-brick instead of a remove-brick operation originally? I

    thought that replace-brick is becoming obsolete.<br>

    <br>

    Thanks,<br>

    Olav<br>

    <br>

    <br>

    <br>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>