[Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM
Erik Jacobson
erik.jacobson at hpe.com
Tue Jan 26 13:40:19 UTC 2021
Thank you so much for responding! More below.
> Anything in the logs of the fuse mount? can you stat the file from the mount?
> also, the report of an image is only 64M makes me think about Sharding as the
> default value of Shard size is 64M.
> Do you have any clues on when this issue start to happen? was there any
> operation done to the Gluster cluster?
- I had just created the gluster volumes within an hour of the problem
to test the vary problem I reported. So it was a "fresh start".
- It booted one or two times, then stopped booting. Once it couldn't
boot, all 3 nodes were the same in that grub2 couldn't boot in the VM
image.
As for the fuse log, I did see a couple of these before it happened the
first time. I'm not sure if it's a clue or not.
[2021-01-25 22:48:19.310467] I [fuse-bridge.c:5777:fuse_graph_sync] 0-fuse: switched to graph 0
[2021-01-25 22:50:09.693958] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] (--> /lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> /lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory
[2021-01-25 22:50:09.694462] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] (--> /lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> /lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory
I have reserved the test system again. My plans today are:
- Start over with the gluster volume on the machine with sles15sp2
updates
- Learn if there are modifications to the image (besides
mounting/umounting filesystems with the image using kpartx to map
them to force it to work). What if I add/remove a byte from the end
of the image file for example.
- Revert the setup to sles15sp2 with no updates. My theory is the
updates are not making a difference and it's just random chance.
(re-making the gluster volume in the process)
- The 64MB shard size made me think too!!
- If the team feels it is worth it, I could try a newer gluster. We're
using the versions we've validated at scale when we have large
clusters in the factory but if the team thinks I should try something
else I'm happy to re-build it!!! We are @ 7.2 plus afr-event-gen-changes
patch.
I will keep a better eye on the fuse log to tie an error to the problem
starting.
THANKS AGAIN for responding and let me know if you have any more
clues!
Erik
>
> On Tue, Jan 26, 2021 at 2:40 AM Erik Jacobson <erik.jacobson at hpe.com> wrote:
>
> Hello all. Thanks again for gluster. We're having a strange problem
> getting virtual machines started that are hosted on a gluster volume.
>
> One of the ways we use gluster now is to make a HA-ish cluster head
> node. A virtual machine runs in the shared storage and is backed up by 3
> physical servers that contribute to the gluster storage share.
>
> We're using sharding in this volume. The VM image file is around 5T and
> we use qemu-img with falloc to get all the blocks allocated in advance.
>
> We are not using gfapi largely because it would mean we have to build
> our own libvirt and qemu and we'd prefer not to do that. So we're using
> a glusterfs fuse mount to host the image. The virtual machine is using
> virtio disks but we had similar trouble using scsi emulation.
>
> The issue: - all seems well, the VM head node installs, boots, etc.
>
> However, at some point, it stops being able to boot! grub2 acts like it
> cannot find /boot. At the grub2 prompt, it can see the partitions, but
> reports no filesystem found where there are indeed filesystems.
>
> If we switch qemu to use "direct kernel load" (bypass grub2), this often
> works around the problem but in one case Linux gave us a clue. Linux
> reported /dev/vda as only being 64 megabytes, which would explain a lot.
> This means the virtual machine Linux though the disk supplied by the
> disk image was tiny! 64M instead of 5T
>
> We are using sles15sp2 and hit the problem more often with updates
> applied than without. I'm in the process of trying to isolate if there
> is a sles15sp2 update causing this, or if we're within "random chance".
>
> On one of the physical nodes, if it is in the failure mode, if I use
> 'kpartx' to create the partitions from the image file, then mount the
> giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then
> umount /mnt, then that physical node starts the VM fine, grub2 loads,
> the virtual machine is fully happy! Until I try to shut it down and
> start it up again, at which point it sticks at grub2 again! What about
> mounting the image file makes it so qemu sees the whole disk?
>
> The problem doesn't always happen but once it starts, the same VM image has
> trouble starting on any of the 3 physical nodes sharing the storage.
> But using the trick to force-mount the root within the image with
> kpartx, then the machine can come up. My only guess is this changes the
> file just a tiny bit in the middle of the image.
>
> Once the problem starts, it keeps happening except temporarily working
> when I do the loop mount trick on the physical admin.
>
>
> Here is some info about what I have in place:
>
>
> nano-1:/adminvm/images # gluster volume info
>
> Volume Name: adminvm
> Type: Replicate
> Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 172.23.255.151:/data/brick_adminvm
> Brick2: 172.23.255.152:/data/brick_adminvm
> Brick3: 172.23.255.153:/data/brick_adminvm
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: on
> user.cifs: off
> cluster.choose-local: off
> client.event-threads: 4
> server.event-threads: 4
> cluster.granular-entry-heal: enable
> storage.owner-uid: 439
> storage.owner-gid: 443
>
>
>
>
> libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
> glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64
> python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch
>
>
>
> nano-1:/adminvm/images # uname -a
> Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC 2021
> (4ff469b) x86_64 x86_64 x86_64 GNU/Linux
> nano-1:/adminvm/images # rpm -qa | grep qemu-4
> qemu-4.2.0-9.4.x86_64
>
>
>
> Would love any advice!!!!
>
>
> Erik
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Respectfully
> Mahdi
More information about the Gluster-users
mailing list