<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>We currently have a Gluster array of three baremetal servers in a

      Replicate 1x3 configuration. This single brick has about 1.1TB of

      data and is configured for 3.7 TB of total space. This array is

      mostly hosting mail in Maildir format, although we'd like it to

      also host some Proxmox VMs - the problem with doing that is that

      the performance of the Gluster array is so slow that booting VMs

      from Gluster makes Proxmox time out! We've instead started

      experimenting with using Gluster's NFS server to host the VMs

      which is much faster, but there are obvious issues with stability.

      We're not really hosting anything important yet, this is still an

      experiment. Except for all our mail, of course.<br>

    </p>

    <p>The e-mail performance isn't spectacularly fast, but mostly

      bearable at the moment. <br>

    </p>

    <p>The real meat of this post however, is "What do we do about

      this?" I figured that I had built a slow RAID configuration (disk

      utilization was very high), so I took down one of the Gluster

      nodes and rebuilt it as a RAID 0 array. This meant starting again

      with a completely empty disk, but after rebuilding the node, and

      starting the volume heal, it absolutely slaughtered performance.

      Our mail server had gotten so slow as to make webmail unusable.

      The process to heal the volume takes <b>days</b> to move 1.1 TB

      of data and we couldn't just let it run with performance that bad,

      so I stopped the Gluster daemon during the day and only ran it at

      night. It took two whole weeks to completely heal the volume in

      this fashion, even when allowing the heal to run over the weekend

      for two days straight. <br>

    </p>

    <p>So what happens when we add more Gluster nodes to this array? Or

      if we wanted to upgrade the hardware in the array in any way? Or

      if I wanted to make any other changes to the array? It seems that

      first, Gluster's promise of high availability is "things will keep

      working, but they'll be so slow in the meantime that nobody <b>wants</b>

      to use the services built on top of it", and the same is true when

      you have to take a node offline for an extended period of time and

      you have to heal the array again. <br>

    </p>

    <p>This is a serious issue with the performance of heal operations.

      What can I do to fix it?<br>

    </p>

  </body>

</html>