[Gluster-users] GlusterFS as virtual machine storage

Wed Sep 6 14:12:21 UTC 2017

Hi all,

I have promised to do some testing and I finally find some time and
infrastructure.

So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created
replicated volume with arbiter (2+1) and VM on KVM (via Openstack)
with disk accessible through gfapi. Volume group is set to virt
(gluster volume set gv_openstack_1 virt). VM runs current (all
packages updated) Ubuntu Xenial.

I set up following fio job:

[job1]
ioengine=libaio
size=1g
loops=16
bs=512k
direct=1
filename=/tmp/fio.data2

When I run fio fio.job and reboot one of the data nodes, IO statistics
reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root
filesystem gets remounted as read-only.

If you care about infrastructure, setup details etc., do not hesitate to ask.

Gluster info on volume:

Volume Name: gv_openstack_1
Type: Replicate
Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gfs-2.san:/export/gfs/gv_1
Brick2: gfs-3.san:/export/gfs/gv_1
Brick3: docker3.san:/export/gfs/gv_1 (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off

Partial KVM XML dump:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source protocol='gluster'
name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'>
        <host name='10.0.1.201' port='24007'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
    </disk>

Networking is LACP on data nodes, stack of Juniper EX4550's (10Gbps
SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all
nodes (including arbiter).

I would really love to know what am I doing wrong, because this is my
experience with Gluster for a long time a and a reason I would not
recommend it as VM storage backend in production environment where you
cannot start/stop VMs on your own (e.g. providing private clouds for
customers).
-ps

On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti at assyoma.it> wrote:
> Il 30-08-2017 17:07 Ivan Rossi ha scritto:
>>
>> There has ben a bug associated to sharding that led to VM corruption
>> that has been around for a long time (difficult to reproduce I
>> understood). I have not seen reports on that for some time after the
>> last fix, so hopefully now VM hosting is stable.
>
>
> Mmmm... this is precisely the kind of bug that scares me... data corruption
> :|
> Any more information on what causes it and how to resolve? Even if in newer
> Gluster releases it is a solved bug, knowledge on how to treat it would be
> valuable.
>
>
> Thanks.
>
> --
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti at assyoma.it - info at assyoma.it
> GPG public key ID: FF5F32A8
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users