[Gluster-users] Problems with qemu and disperse volumes (live merge)

Marco Fais evilmf at gmail.com
Tue Jun 30 11:18:10 UTC 2020


i Strahil

thanks a million for your reply.

I mainly thought that disperse volume where not supported because of the
complexity of managing them (due to the various possible combinations of
number of hosts / bricks and redundancy); however I assumed that once
implemented and managed separately they could be used as VM storage for
oVirt -- given they are in general supported by RHGS.

When you say they will not be optimal are you referring mainly to
performance considerations? We did plenty of testing, and in terms of
performance didn't have issues even with I/O intensive workloads (using
SSDs, I had issues with spinning disks).

Replica 3 with arbiter is the other possible options for us, but clearly is
less efficient in terms of storage usage than the current disperse 4+2
volumes, and the main issue for us is that having two servers down (out of
the three in each replica) will create a service outage -- while with a
disperse 4+2 combination we can withstand two servers down out of six (e.g.
one has been brought down in maintenance and at that time another server
has an issue). That's the reason I am keen to have it working with disperse
-- apart from the specific issue with snapshot deletion, everything seems
to work very well.

In regards to the options -- apologies I had applied the group with
the "gluster
volume set SSD_Storage group virt" command but for some reason it doesn't
list the options in the "info".  I have re-applied them individually and
the results are the same. See below for the list of options I am using:

Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
performance.client-io-threads: on
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.fips-mode-rchecksum: on
nfs.disable: on

Unfortunately we have the issue with all VMs -- doesn't seem to depend on
the allocation of storage either (thin provisioned or pre allocated).

Thanks!
Marco


On Tue, 30 Jun 2020 at 05:12, Strahil Nikolov <hunter86_bg at yahoo.com> wrote:

> Hey Marco,
>
> have you wondered why non-replifa volumes are not supported for oVirt  (or
> the paid downstreams)? Also disperse volume will not be optimal for your
> needs.
>
> Have you thought about replica 3 with an arbiter ?
>
> Now on the topic.
> I don't see the optimize for virt option which you also need to apply
> (which involves sharding too).  You can find them in the gluster's group
> dir (it was someething like  /var/lib/glusterd/groups/virt).
>
> With unsupported volume type and without any option the oVirt community
> recommend, you can and most probably feel bad situations.
>
> Please, set the virt group options and try again.
>
> Does the issue occur on another VM ?
>
>
> Best Regards,
> Strahil Nikolov
>
>
> На 30 юни 2020 г. 1:59:36 GMT+03:00, Marco Fais <evilmf at gmail.com> написа:
> >Hi,
> >
> >I am having a problem recently with Gluster disperse volumes and live
> >merge
> >on qemu-kvm.
> >
> >I am using Gluster as a storage backend of an oVirt cluster; we are
> >planning to use VM snapshots in the process of taking daily backups on
> >the
> >VMs and we are encountering issues when the VMs are stored in a
> >distributed-disperse volume.
> >
> >First of all, I am using gluster 7.5, libvirt 6.0, qemu 4.2 and oVirt
> >4.4.0
> >on CentOS 8.1
> >
> >The sequence of events is the following:
> >
> >1) On a running VM, create a new snapshot
> >
> >The operation completes successfully, however I can observe the
> >following
> >errors on the gluster logs:
> >
> >[2020-06-29 21:54:18.942422] I [MSGID: 109066]
> >[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta.new
> >(a89f2ccb-be41-4ff7-bbaf-abb786e76bc7)
> >(hash=SSD_Storage-disperse-1/cache=SSD_Storage-disperse-1) =>
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta
> >(f55c1f35-63fa-4d27-9aa9-09b60163e565)
> >(hash=SSD_Storage-disperse-2/cache=SSD_Storage-disperse-1)
> >[2020-06-29 21:54:18.947273] W [MSGID: 122019]
> >[ec-helpers.c:401:ec_loc_gfid_check] 0-SSD_Storage-disperse-2:
> >Mismatching
> >GFID's in loc
> >[2020-06-29 21:54:18.947290] W [MSGID: 109002]
> >[dht-rename.c:1019:dht_rename_links_create_cbk] 0-SSD_Storage-dht:
> >link/file
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta
> >on SSD_Storage-disperse-2 failed [Input/output error]
> >[2020-06-29 21:54:19.197482] I [MSGID: 109066]
> >[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta.new
> >(b4888032-3758-4f62-a4ae-fb48902f83d2)
> >(hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4) =>
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta
> >((null)) (hash=SSD_Storage-disperse-4/cache=<nul>)
> >
> >2) Once the snapshot has been created, try to delete it while the VM is
> >running
> >
> >The above seems to be running for a couple of seconds and then suddenly
> >the
> >qemu-kvm process crashes. On the qemu VM logs I can see the following:
> >
> >Unexpected error in raw_check_lock_bytes() at block/file-posix.c:811:
> >2020-06-29T21:56:23.933603Z qemu-kvm: Failed to get shared "write" lock
> >
> >At the same time, the gluster logs report the following:
> >
> >[2020-06-29 21:56:23.850417] I [MSGID: 109066]
> >[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta.new
> >(1999a713-a0ed-45fb-8ab7-7dbda6d02a78)
> >(hash=SSD_Storage-disperse-1/cache=SSD_Storage-disperse-1) =>
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta
> >(a89f2ccb-be41-4ff7-bbaf-abb786e76bc7)
> >(hash=SSD_Storage-disperse-2/cache=SSD_Storage-disperse-1)
> >[2020-06-29 21:56:23.855027] W [MSGID: 122019]
> >[ec-helpers.c:401:ec_loc_gfid_check] 0-SSD_Storage-disperse-2:
> >Mismatching
> >GFID's in loc
> >[2020-06-29 21:56:23.855045] W [MSGID: 109002]
> >[dht-rename.c:1019:dht_rename_links_create_cbk] 0-SSD_Storage-dht:
> >link/file
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta
> >on SSD_Storage-disperse-2 failed [Input/output error]
> >[2020-06-29 21:56:23.922638] I [MSGID: 109066]
> >[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta.new
> >(e5c578b3-b91a-4263-a7e3-40f9c7e3628b)
> >(hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4) =>
>
> >/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta
> >(b4888032-3758-4f62-a4ae-fb48902f83d2)
> >(hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4)
> >[2020-06-29 21:56:26.017309] E
> >[fuse-bridge.c:227:check_and_dump_fuse_W]
> >(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53]
> >(-->
> >/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82]
> >(-->
> >/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072]
> >(-->
> >/lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (-->
> >/lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse:
> >writing to fuse device failed: No such file or directory
> >[2020-06-29 21:56:26.017421] E
> >[fuse-bridge.c:227:check_and_dump_fuse_W]
> >(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53]
> >(-->
> >/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82]
> >(-->
> >/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072]
> >(-->
> >/lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (-->
> >/lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse:
> >writing to fuse device failed: No such file or directory
> >[2020-06-29 21:56:26.017524] E
> >[fuse-bridge.c:227:check_and_dump_fuse_W]
> >(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53]
> >(-->
> >/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82]
> >(-->
> >/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072]
> >(-->
> >/lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (-->
> >/lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse:
> >writing to fuse device failed: No such file or directory
> >
> >Initially I thought this was a qemu-kvm issue; however the above works
> >perfectly on a distributed-replicated volume on exactly the same HW,
> >software and gluster volume options.
> >Also, the issue can be replicated 100% of the times -- every time I try
> >to
> >delete the snapshot the process crashes.
> >
> >Not sure what's the best way to proceed -- I have tried to file a bug
> >but
> >unfortunately didn't get any traction.
> >Gluster volume info here:
> >
> >Volume Name: SSD_Storage
> >Type: Distributed-Disperse
> >Volume ID: 4e1bf45d-9ecd-44f2-acde-dd338e18379c
> >Status: Started
> >Snapshot Count: 0
> >Number of Bricks: 6 x (4 + 2) = 36
> >Transport-type: tcp
> >Bricks:
> >Brick1: cld-cnvirt-h01-storage:/bricks/vm_b1/brick
> >Brick2: cld-cnvirt-h02-storage:/bricks/vm_b1/brick
> >Brick3: cld-cnvirt-h03-storage:/bricks/vm_b1/brick
> >Brick4: cld-cnvirt-h04-storage:/bricks/vm_b1/brick
> >Brick5: cld-cnvirt-h05-storage:/bricks/vm_b1/brick
> >Brick6: cld-cnvirt-h06-storage:/bricks/vm_b1/brick
> >Brick7: cld-cnvirt-h01-storage:/bricks/vm_b2/brick
> >Brick8: cld-cnvirt-h02-storage:/bricks/vm_b2/brick
> >Brick9: cld-cnvirt-h03-storage:/bricks/vm_b2/brick
> >Brick10: cld-cnvirt-h04-storage:/bricks/vm_b2/brick
> >Brick11: cld-cnvirt-h05-storage:/bricks/vm_b2/brick
> >Brick12: cld-cnvirt-h06-storage:/bricks/vm_b2/brick
> >Brick13: cld-cnvirt-h01-storage:/bricks/vm_b3/brick
> >Brick14: cld-cnvirt-h02-storage:/bricks/vm_b3/brick
> >Brick15: cld-cnvirt-h03-storage:/bricks/vm_b3/brick
> >Brick16: cld-cnvirt-h04-storage:/bricks/vm_b3/brick
> >Brick17: cld-cnvirt-h05-storage:/bricks/vm_b3/brick
> >Brick18: cld-cnvirt-h06-storage:/bricks/vm_b3/brick
> >Brick19: cld-cnvirt-h01-storage:/bricks/vm_b4/brick
> >Brick20: cld-cnvirt-h02-storage:/bricks/vm_b4/brick
> >Brick21: cld-cnvirt-h03-storage:/bricks/vm_b4/brick
> >Brick22: cld-cnvirt-h04-storage:/bricks/vm_b4/brick
> >Brick23: cld-cnvirt-h05-storage:/bricks/vm_b4/brick
> >Brick24: cld-cnvirt-h06-storage:/bricks/vm_b4/brick
> >Brick25: cld-cnvirt-h01-storage:/bricks/vm_b5/brick
> >Brick26: cld-cnvirt-h02-storage:/bricks/vm_b5/brick
> >Brick27: cld-cnvirt-h03-storage:/bricks/vm_b5/brick
> >Brick28: cld-cnvirt-h04-storage:/bricks/vm_b5/brick
> >Brick29: cld-cnvirt-h05-storage:/bricks/vm_b5/brick
> >Brick30: cld-cnvirt-h06-storage:/bricks/vm_b5/brick
> >Brick31: cld-cnvirt-h01-storage:/bricks/vm_b6/brick
> >Brick32: cld-cnvirt-h02-storage:/bricks/vm_b6/brick
> >Brick33: cld-cnvirt-h03-storage:/bricks/vm_b6/brick
> >Brick34: cld-cnvirt-h04-storage:/bricks/vm_b6/brick
> >Brick35: cld-cnvirt-h05-storage:/bricks/vm_b6/brick
> >Brick36: cld-cnvirt-h06-storage:/bricks/vm_b6/brick
> >Options Reconfigured:
> >nfs.disable: on
> >storage.fips-mode-rchecksum: on
> >performance.strict-o-direct: on
> >network.remote-dio: off
> >storage.owner-uid: 36
> >storage.owner-gid: 36
> >network.ping-timeout: 30
> >
> >I have tried many different options but unfortunately have the same
> >results. I have the same problem in three different clusters (same
> >versions).
> >
> >Any suggestions?
> >
> >Thanks,
> >Marco
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200630/57d46712/attachment.html>


More information about the Gluster-users mailing list