<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Thanks for the work on gluster.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We have a situation where we need a very large virtual machine image. We use a simple raw image but it can be up to 40T in size in some cases. For this experiment we’ll call it 24T.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When creating the image on fuse with qemu-img, using falloc preallocation, the qemu-img create fails and a fuse error results. This happens after around 3 hours.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I created a simple C program using gfapi that does the fallocate of 10T and it to 1.25 hours. I didn’t run tests at larger than that as 1.25 hours is too long anyway.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Using qemu-img in prellocation-falloc gfapi mode takes a long time too – similar to qemu-img in gfapi mode.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">However, I found if I create a 2.4T image file and then do 9 more resizes to bring it up to the full desired size (24T in this case), it only takes like 16 minutes total (I did this on the fuse mount). This includes the first 2.4T qemu-img
create (prealloc falloc), followed by 9 resize +2.4T runs.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We are avoiding a non-prellocated image as we have had trouble with people assuming available disk space “is available” and running bricks out of space by accident.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We would like to avoid the kludge of calling qemu-img 10 times (or more) to make a larger fallocated image. If there are suggested methods or tunings, please let me know!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We are currently at gluster 9.3<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Volume setup:<o:p></o:p></p>
<p class="MsoNormal">[root@nano-1 images]# gluster volume info adminvm<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Volume Name: adminvm<o:p></o:p></p>
<p class="MsoNormal">Type: Replicate<o:p></o:p></p>
<p class="MsoNormal">Volume ID: e09122b9-8bc4-409b-a423-7596feebf941<o:p></o:p></p>
<p class="MsoNormal">Status: Started<o:p></o:p></p>
<p class="MsoNormal">Snapshot Count: 0<o:p></o:p></p>
<p class="MsoNormal">Number of Bricks: 1 x 3 = 3<o:p></o:p></p>
<p class="MsoNormal">Transport-type: tcp<o:p></o:p></p>
<p class="MsoNormal">Bricks:<o:p></o:p></p>
<p class="MsoNormal">Brick1: 172.23.254.181:/data/brick_adminvm<o:p></o:p></p>
<p class="MsoNormal">Brick2: 172.23.254.182:/data/brick_adminvm<o:p></o:p></p>
<p class="MsoNormal">Brick3: 172.23.254.183:/data/brick_adminvm<o:p></o:p></p>
<p class="MsoNormal">Options Reconfigured:<o:p></o:p></p>
<p class="MsoNormal">performance.client-io-threads: on<o:p></o:p></p>
<p class="MsoNormal">nfs.disable: on<o:p></o:p></p>
<p class="MsoNormal">transport.address-family: inet<o:p></o:p></p>
<p class="MsoNormal">storage.fips-mode-rchecksum: on<o:p></o:p></p>
<p class="MsoNormal">cluster.granular-entry-heal: enable<o:p></o:p></p>
<p class="MsoNormal">performance.quick-read: off<o:p></o:p></p>
<p class="MsoNormal">performance.read-ahead: off<o:p></o:p></p>
<p class="MsoNormal">performance.io-cache: off<o:p></o:p></p>
<p class="MsoNormal">performance.low-prio-threads: 32<o:p></o:p></p>
<p class="MsoNormal">network.remote-dio: disable<o:p></o:p></p>
<p class="MsoNormal">performance.strict-o-direct: on<o:p></o:p></p>
<p class="MsoNormal">cluster.eager-lock: enable<o:p></o:p></p>
<p class="MsoNormal">cluster.quorum-type: auto<o:p></o:p></p>
<p class="MsoNormal">cluster.server-quorum-type: server<o:p></o:p></p>
<p class="MsoNormal">cluster.data-self-heal-algorithm: full<o:p></o:p></p>
<p class="MsoNormal">cluster.locking-scheme: granular<o:p></o:p></p>
<p class="MsoNormal">cluster.shd-max-threads: 8<o:p></o:p></p>
<p class="MsoNormal">cluster.shd-wait-qlength: 10000<o:p></o:p></p>
<p class="MsoNormal">features.shard: on<o:p></o:p></p>
<p class="MsoNormal">user.cifs: off<o:p></o:p></p>
<p class="MsoNormal">cluster.choose-local: off<o:p></o:p></p>
<p class="MsoNormal">client.event-threads: 4<o:p></o:p></p>
<p class="MsoNormal">server.event-threads: 4<o:p></o:p></p>
<p class="MsoNormal">network.ping-timeout: 20<o:p></o:p></p>
<p class="MsoNormal">server.tcp-user-timeout: 20<o:p></o:p></p>
<p class="MsoNormal">server.keepalive-time: 10<o:p></o:p></p>
<p class="MsoNormal">server.keepalive-interval: 2<o:p></o:p></p>
<p class="MsoNormal">server.keepalive-count: 5<o:p></o:p></p>
<p class="MsoNormal">cluster.lookup-optimize: off<o:p></o:p></p>
<p class="MsoNormal">network.frame-timeout: 10800<o:p></o:p></p>
<p class="MsoNormal">performance.io-thread-count: 32<o:p></o:p></p>
<p class="MsoNormal">storage.owner-uid: 107<o:p></o:p></p>
<p class="MsoNormal">storage.owner-gid: 107<o:p></o:p></p>
</div>
</body>
</html>