[Gluster-users] GlusterFS as virtual machine storage

Thu Aug 24 20:49:02 UTC 2017

On Thu, Aug 24, 2017 at 10:20 PM, WK <wkmail at bneit.com> wrote:
>
>
> On 8/23/2017 10:44 PM, Pavel Szalbot wrote:
>>
>> Hi,
>>
>> On Thu, Aug 24, 2017 at 2:13 AM, WK <wkmail at bneit.com> wrote:
>>>
>>> The default timeout for most OS versions is 30 seconds and the Gluster
>>> timeout is 42, so yes you can trigger an RO event.
>>
>> I get read-only mount within approximately 2 seconds after failed IO.
>
>
> Hmm, we don't see that, even on busy VMs.
> We ARE using QCOW2 disk images though.

I am using them as well:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/mnt/9b7e67e7ed8dc899919ce25f6b14d094/volume-5299bad3-56f5-4ee7-967e-8881a108406e'>
        <seclabel model='selinux' labelskip='yes'/>
      </source>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>

>>> Though it is easy enough to raise as Pavel mentioned
>>>
>>> # echo 90 > /sys/block/sda/device/timeout
>>
>> AFAIK this is applicable only for directly attached block devices
>> (non-virtualized).
>
>
> No, if you use SATA/IDE emulation (NOT virtio) it is there WITHIN the VM.
> We have a lot of legacy VMs from older projects/workloads that have that and
> we haven't bothered changing them because "they are working fine now"
> It is NOT there on virtio.

I kind of expected this. We use virtio everywhere except one Win 2012
VM which suffers from SATA/IDE emulation a lot (performance wise).

>>> Likewise virtio "disks" don't even have a timeout value that I am aware
>>> of
>>> and I don't recall them being extremely sensitive to disk issues on
>>> either
>>> Gluster, NFS or DAS.
>>
>> We use only virtio and these problems are persistent - temporarily
>> suspending a node (e.g. HW or Gluster upgrade, reboot) is very scary,
>> because we often end up with read-only filesystems on all VMs.
>>
>> However we use ext4, so I cannot comment on XFS.
>
>
> We use the fuse mount, because we are lazy and haven't upgraded to libgfapi.
> I hope to start a new cluster with to libfgapi shortly because of the better
> performance.
> Also we use a localhost mount for the gluster driveset on each compute node
> (i.e. so called hyperconverged). So the only 'gluster' only kit is the
> lightweight arb box.
> So those VMs in the gluster 'pool' have a local write and then only 1
> off-server write (to the other gluster enabled compute host), which means
> pretty good performance.
>
> We use the gluster included 'virt' tuning set of:

Same here. Had some problems with libgfapi on CentOS back in Gluster
3.7 days with dated libvirt available in the system. If you care about
HA you should have appropriate staging environment - unfortunately
learnt this the hard way. And you definitely should watch issue
tracker and mailing lists...

>> This discussion will probably end before I migrate VMs from Gluster to
>> local storage on our Openstack nodes, but I might run some tests
>> afterwards and keep you posted.
>
>
> I would be interested in your results. You may also look into Ceph. It is
> more complicated than Gluster, (well, more complicated than our simple
> little Gluster arrangement) but the OpenStack people swear by it.
> It wasn't suited to our needs, but it tested well, when we looked into it
> last year.

At first I was looking into Ceph, but the setup procedure was fairly
complicated (almost 3 years ago, do not remember the
version/codename), I saw the ScaleIO vs. Ceph video and colleague of
mine recommended Gluster because he used it in a hospital for VMs.
Setup was easy-peasy, but tuning was pretty hard and documentation did
not help - mailing list archives did better job.

However this colleague was shocked when I showed him randomly
truncated files during heal process and he also did 72+ hours heal on
some VMs (Gluster 3.4 IIRC). I guess he was more lucky and never lost
the data - I had to restore from backups several times.

Btw our Gluster network is 10Gbps fiber VLAN with stacked switches
(Juniper EX4550s) & LACP on servers to ensure high-availability. I am
still not comfortable running critical production VMs there because of
issues we experienced.

-ps