[Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

Mon Jan 25 23:13:09 UTC 2021

Hello all. Thanks again for gluster. We're having a strange problem
getting virtual machines started that are hosted on a gluster volume.

One of the ways we use gluster now is to make a HA-ish cluster head
node. A virtual machine runs in the shared storage and is backed up by 3
physical servers that contribute to the gluster storage share.

We're using sharding in this volume. The VM image file is around 5T and
we use qemu-img with falloc to get all the blocks allocated in advance.

We are not using gfapi largely because it would mean we have to build
our own libvirt and qemu and we'd prefer not to do that. So we're using
a glusterfs fuse mount to host the image. The virtual machine is using
virtio disks but we had similar trouble using scsi emulation.

The issue: - all seems well, the VM head node installs, boots, etc.

However, at some point, it stops being able to boot! grub2 acts like it
cannot find /boot. At the grub2 prompt, it can see the partitions, but
reports no filesystem found where there are indeed filesystems.

If we switch qemu to use "direct kernel load" (bypass grub2), this often
works around the problem but in one case Linux gave us a clue. Linux
reported /dev/vda as only being 64 megabytes, which would explain a lot.
This means the virtual machine Linux though the disk supplied by the
disk image was tiny! 64M instead of 5T

We are using sles15sp2 and hit the problem more often with updates
applied than without. I'm in the process of trying to isolate if there
is a sles15sp2 update causing this, or if we're within "random chance".

On one of the physical nodes, if it is in the failure mode, if I use
'kpartx' to create the partitions from the image file, then mount the
giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then
umount /mnt, then that physical node starts the VM fine, grub2 loads,
the virtual machine is fully happy!  Until I try to shut it down and
start it up again, at which point it sticks at grub2 again! What about
mounting the image file makes it so qemu sees the whole disk?

The problem doesn't always happen but once it starts, the same VM image has
trouble starting on any of the 3 physical nodes sharing the storage.
But using the trick to force-mount the root within the image with
kpartx, then the machine can come up. My only guess is this changes the
file just a tiny bit in the middle of the image.

Once the problem starts, it keeps happening except temporarily working
when I do the loop mount trick on the physical admin.

Here is some info about what I have in place:

nano-1:/adminvm/images # gluster volume info

Volume Name: adminvm
Type: Replicate
Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.23.255.151:/data/brick_adminvm
Brick2: 172.23.255.152:/data/brick_adminvm
Brick3: 172.23.255.153:/data/brick_adminvm
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
cluster.granular-entry-heal: enable
storage.owner-uid: 439
storage.owner-gid: 443

libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch

nano-1:/adminvm/images # uname -a
Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC 2021 (4ff469b) x86_64 x86_64 x86_64 GNU/Linux
nano-1:/adminvm/images # rpm -qa | grep qemu-4
qemu-4.2.0-9.4.x86_64

Would love any advice!!!!

Erik