<div dir="ltr">you need to set <br><br>cluster.server-quorum-ratio             51% <br></div><div class="gmail_extra"><br><div class="gmail_quote">On 6 September 2017 at 10:12, Pavel Szalbot <span dir="ltr">&lt;<a href="mailto:pavel.szalbot@gmail.com" target="_blank">pavel.szalbot@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

I have promised to do some testing and I finally find some time and<br>

infrastructure.<br>

<br>

So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created<br>

replicated volume with arbiter (2+1) and VM on KVM (via Openstack)<br>

with disk accessible through gfapi. Volume group is set to virt<br>

(gluster volume set gv_openstack_1 virt). VM runs current (all<br>

packages updated) Ubuntu Xenial.<br>

<br>

I set up following fio job:<br>

<br>

[job1]<br>

ioengine=libaio<br>

size=1g<br>

loops=16<br>

bs=512k<br>

direct=1<br>

filename=/tmp/fio.data2<br>

<br>

When I run fio fio.job and reboot one of the data nodes, IO statistics<br>

reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root<br>

filesystem gets remounted as read-only.<br>

<br>

If you care about infrastructure, setup details etc., do not hesitate to ask.<br>

<br>

Gluster info on volume:<br>

<br>

Volume Name: gv_openstack_1<br>

Type: Replicate<br>

Volume ID: 2425ae63-3765-4b5e-915b-<wbr>e132e0d3fff1<br>

Status: Started<br>

Snapshot Count: 0<br>

Number of Bricks: 1 x (2 + 1) = 3<br>

Transport-type: tcp<br>

Bricks:<br>

Brick1: gfs-2.san:/export/gfs/gv_1<br>

Brick2: gfs-3.san:/export/gfs/gv_1<br>

Brick3: docker3.san:/export/gfs/gv_1 (arbiter)<br>

Options Reconfigured:<br>

nfs.disable: on<br>

transport.address-family: inet<br>

performance.quick-read: off<br>

performance.read-ahead: off<br>

performance.io-cache: off<br>

performance.stat-prefetch: off<br>

performance.low-prio-threads: 32<br>

network.remote-dio: enable<br>

cluster.eager-lock: enable<br>

cluster.quorum-type: auto<br>

cluster.server-quorum-type: server<br>

cluster.data-self-heal-<wbr>algorithm: full<br>

cluster.locking-scheme: granular<br>

cluster.shd-max-threads: 8<br>

cluster.shd-wait-qlength: 10000<br>

features.shard: on<br>

user.cifs: off<br>

<br>

Partial KVM XML dump:<br>

<br>

    &lt;disk type=&#39;network&#39; device=&#39;disk&#39;&gt;<br>

      &lt;driver name=&#39;qemu&#39; type=&#39;raw&#39; cache=&#39;none&#39;/&gt;<br>

      &lt;source protocol=&#39;gluster&#39;<br>

name=&#39;gv_openstack_1/volume-<wbr>77ebfd13-6a92-4f38-b036-<wbr>e9e55d752e1e&#39;&gt;<br>

        &lt;host name=&#39;10.0.1.201&#39; port=&#39;24007&#39;/&gt;<br>

      &lt;/source&gt;<br>

      &lt;backingStore/&gt;<br>

      &lt;target dev=&#39;vda&#39; bus=&#39;virtio&#39;/&gt;<br>

      &lt;serial&gt;77ebfd13-6a92-4f38-<wbr>b036-e9e55d752e1e&lt;/serial&gt;<br>

      &lt;alias name=&#39;virtio-disk0&#39;/&gt;<br>

      &lt;address type=&#39;pci&#39; domain=&#39;0x0000&#39; bus=&#39;0x00&#39; slot=&#39;0x04&#39;<br>

function=&#39;0x0&#39;/&gt;<br>

    &lt;/disk&gt;<br>

<br>

Networking is LACP on data nodes, stack of Juniper EX4550&#39;s (10Gbps<br>

SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all<br>

nodes (including arbiter).<br>

<br>

I would really love to know what am I doing wrong, because this is my<br>

experience with Gluster for a long time a and a reason I would not<br>

recommend it as VM storage backend in production environment where you<br>

cannot start/stop VMs on your own (e.g. providing private clouds for<br>

customers).<br>

<span class="HOEnZb"><font color="#888888">-ps<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti &lt;<a href="mailto:g.danti@assyoma.it">g.danti@assyoma.it</a>&gt; wrote:<br>

&gt; Il 30-08-2017 17:07 Ivan Rossi ha scritto:<br>

&gt;&gt;<br>

&gt;&gt; There has ben a bug associated to sharding that led to VM corruption<br>

&gt;&gt; that has been around for a long time (difficult to reproduce I<br>

&gt;&gt; understood). I have not seen reports on that for some time after the<br>

&gt;&gt; last fix, so hopefully now VM hosting is stable.<br>

&gt;<br>

&gt;<br>

&gt; Mmmm... this is precisely the kind of bug that scares me... data corruption<br>

&gt; :|<br>

&gt; Any more information on what causes it and how to resolve? Even if in newer<br>

&gt; Gluster releases it is a solved bug, knowledge on how to treat it would be<br>

&gt; valuable.<br>

&gt;<br>

&gt;<br>

&gt; Thanks.<br>

&gt;<br>

&gt; --<br>

&gt; Danti Gionatan<br>

&gt; Supporto Tecnico<br>

&gt; Assyoma S.r.l. - <a href="http://www.assyoma.it" rel="noreferrer" target="_blank">www.assyoma.it</a><br>

&gt; email: <a href="mailto:g.danti@assyoma.it">g.danti@assyoma.it</a> - <a href="mailto:info@assyoma.it">info@assyoma.it</a><br>

&gt; GPG public key ID: FF5F32A8<br>

&gt; ______________________________<wbr>_________________<br>

&gt; Gluster-users mailing list<br>

&gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>

______________________________<wbr>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>

</div></div></blockquote></div><br></div>