[Gluster-users] Creating a large pre-allocated qemu-img raw image takes too long and fails on fuse

Jacobson, Erik erik.jacobson at hpe.com
Mon Aug 12 20:53:07 UTC 2024


Thanks for the work on gluster.

We have a situation where we need a very large virtual machine image. We use a simple raw image but it can be up to 40T in size in some cases. For this experiment we’ll call it 24T.

When creating the image on fuse with qemu-img, using falloc preallocation, the qemu-img create fails and a fuse error results. This happens after around 3 hours.

I created a simple C program using gfapi that does the fallocate of 10T and it to 1.25 hours. I didn’t run tests at larger than that as 1.25 hours is too long anyway.

Using qemu-img in prellocation-falloc gfapi mode takes a long time too – similar to qemu-img in gfapi mode.

However, I found if I create a 2.4T image file and then do 9 more resizes to bring it up to the full desired size (24T in this case), it only takes like 16 minutes total  (I did this on the fuse mount). This includes the first 2.4T qemu-img create (prealloc falloc), followed by 9 resize +2.4T runs.

We are avoiding a non-prellocated image as we have had trouble with people assuming available disk space “is available” and running bricks out of space by accident.

We would like to avoid the kludge of calling qemu-img 10 times (or more) to make a larger fallocated image. If there are suggested methods or tunings, please let me know!

We are currently at gluster 9.3

Volume setup:
[root at nano-1 images]# gluster volume info adminvm

Volume Name: adminvm
Type: Replicate
Volume ID: e09122b9-8bc4-409b-a423-7596feebf941
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.23.254.181:/data/brick_adminvm
Brick2: 172.23.254.182:/data/brick_adminvm
Brick3: 172.23.254.183:/data/brick_adminvm
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: disable
performance.strict-o-direct: on
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4
network.ping-timeout: 20
server.tcp-user-timeout: 20
server.keepalive-time: 10
server.keepalive-interval: 2
server.keepalive-count: 5
cluster.lookup-optimize: off
network.frame-timeout: 10800
performance.io-thread-count: 32
storage.owner-uid: 107
storage.owner-gid: 107
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20240812/2caa3415/attachment.html>


More information about the Gluster-users mailing list