[Gluster-users] [ovirt-users] Re: VM disk corruption with LSM on Gluster

Krutika Dhananjay kdhananj at redhat.com
Wed Mar 27 09:00:43 UTC 2019


This is needed to prevent any inconsistencies stemming from buffered
writes/caching file data during live VM migration.
Besides, for Gluster to truly honor direct-io behavior in qemu's
'cache=none' mode (which is what oVirt uses),
one needs to turn on performance.strict-o-direct and disable remote-dio.

-Krutika

On Wed, Mar 27, 2019 at 12:24 PM Leo David <leoalex at gmail.com> wrote:

> Hi,
> I can confirm that after setting these two options, I haven't encountered
> disk corruptions anymore.
> The downside, is that at least for me it had a pretty big impact on
> performance.
> The iops really went down - performing  inside vm fio tests.
>
> On Wed, Mar 27, 2019, 07:03 Krutika Dhananjay <kdhananj at redhat.com> wrote:
>
>> Could you enable strict-o-direct and disable remote-dio on the src volume
>> as well, restart the vms on "old" and retry migration?
>>
>> # gluster volume set <VOLNAME> performance.strict-o-direct on
>> # gluster volume set <VOLNAME> network.remote-dio off
>>
>> -Krutika
>>
>> On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen <sander at hoentjen.eu>
>> wrote:
>>
>>> On 26-03-19 14:23, Sahina Bose wrote:
>>> > +Krutika Dhananjay and gluster ml
>>> >
>>> > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen <sander at hoentjen.eu>
>>> wrote:
>>> >> Hello,
>>> >>
>>> >> tl;dr We have disk corruption when doing live storage migration on
>>> oVirt
>>> >> 4.2 with gluster 3.12.15. Any idea why?
>>> >>
>>> >> We have a 3-node oVirt cluster that is both compute and
>>> gluster-storage.
>>> >> The manager runs on separate hardware. We are running out of space on
>>> >> this volume, so we added another Gluster volume that is bigger, put a
>>> >> storage domain on it and then we migrated VM's to it with LSM. After
>>> >> some time, we noticed that (some of) the migrated VM's had corrupted
>>> >> filesystems. After moving everything back with export-import to the
>>> old
>>> >> domain where possible, and recovering from backups where needed we set
>>> >> off to investigate this issue.
>>> >>
>>> >> We are now at the point where we can reproduce this issue within a
>>> day.
>>> >> What we have found so far:
>>> >> 1) The corruption occurs at the very end of the replication step, most
>>> >> probably between START and FINISH of diskReplicateFinish, before the
>>> >> START merge step
>>> >> 2) In the corrupted VM, at some place where data should be, this data
>>> is
>>> >> replaced by zero's. This can be file-contents or a directory-structure
>>> >> or whatever.
>>> >> 3) The source gluster volume has different settings then the
>>> destination
>>> >> (Mostly because the defaults were different at creation time):
>>> >>
>>> >> Setting                                 old(src)  new(dst)
>>> >> cluster.op-version                      30800     30800 (the same)
>>> >> cluster.max-op-version                  31202     31202 (the same)
>>> >> cluster.metadata-self-heal              off       on
>>> >> cluster.data-self-heal                  off       on
>>> >> cluster.entry-self-heal                 off       on
>>> >> performance.low-prio-threads            16        32
>>> >> performance.strict-o-direct             off       on
>>> >> network.ping-timeout                    42        30
>>> >> network.remote-dio                      enable    off
>>> >> transport.address-family                -         inet
>>> >> performance.stat-prefetch               off       on
>>> >> features.shard-block-size               512MB     64MB
>>> >> cluster.shd-max-threads                 1         8
>>> >> cluster.shd-wait-qlength                1024      10000
>>> >> cluster.locking-scheme                  full      granular
>>> >> cluster.granular-entry-heal             no        enable
>>> >>
>>> >> 4) To test, we migrate some VM's back and forth. The corruption does
>>> not
>>> >> occur every time. To this point it only occurs from old to new, but we
>>> >> don't have enough data-points to be sure about that.
>>> >>
>>> >> Anybody an idea what is causing the corruption? Is this the best list
>>> to
>>> >> ask, or should I ask on a Gluster list? I am not sure if this is oVirt
>>> >> specific or Gluster specific though.
>>> > Do you have logs from old and new gluster volumes? Any errors in the
>>> > new volume's fuse mount logs?
>>>
>>> Around the time of corruption I see the message:
>>> The message "I [MSGID: 133017] [shard.c:4941:shard_seek]
>>> 0-ZoneA_Gluster1-shard: seek called on
>>> 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not supported]" repeated
>>> 231 times between [2019-03-26 13:14:22.297333] and [2019-03-26
>>> 13:15:42.912170]
>>>
>>> I also see this message at other times, when I don't see the corruption
>>> occur, though.
>>>
>>> --
>>> Sander
>>> _______________________________________________
>>> Users mailing list -- users at ovirt.org
>>> To unsubscribe send an email to users-leave at ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/M3T2VGGGV6DE643ZKKJUAF274VSWTJFH/
>>>
>> _______________________________________________
>> Users mailing list -- users at ovirt.org
>> To unsubscribe send an email to users-leave at ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZUIRM5PT4Y4USOSDGSUEP3YEE23LE4WG/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190327/9c5776d7/attachment.html>


More information about the Gluster-users mailing list