<div dir="ltr"><span style="color:rgb(0,0,0)">Thanks for the feedback, 7.9 is really stable, in fact, it is so stable that we might not even upgrade to 8.x for some time.</span><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 1, 2021 at 11:56 PM Erik Jacobson &lt;<a href="mailto:erik.jacobson@hpe.com">erik.jacobson@hpe.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">We think this fixed it. While there is random chance in there, we can&#39;t<br>

repeat it in 7.9. So I&#39;ll close this thread out for now.<br>

<br>

We&#39;ll ask for help again if needed. Thanks for all the kind responses,<br>

<br>

Erik<br>

<br>

On Fri, Jan 29, 2021 at 02:20:56PM -0600, Erik Jacobson wrote:<br>

&gt; I updated to 7.9, rebooted everything, and it started working.<br>

&gt; <br>

&gt; I will have QE try to break it again and report back. I couldn&#39;t break<br>

&gt; it but they&#39;re better at breaking things (which is hard to imagine :)<br>

&gt; <br>

&gt; <br>

&gt; On Fri, Jan 29, 2021 at 01:11:50PM -0600, Erik Jacobson wrote:<br>

&gt; &gt; Thank you.<br>

&gt; &gt; <br>

&gt; &gt; We reproduced the problem after force-killing one of the 3 physical<br>

&gt; &gt; nodes 6 times in a row.<br>

&gt; &gt; <br>

&gt; &gt; At that point, the grub2 loaded off the qemu virtual hard drive, but<br>

&gt; &gt; could not find partitions. Since there is random luck involved, we don&#39;t<br>

&gt; &gt; actually know if it was the force-killing that caused it to stop<br>

&gt; &gt; working.<br>

&gt; &gt; <br>

&gt; &gt; When I start the VM with the image in this state, there is nothing<br>

&gt; &gt; interesting in the fuse log for the volume in /var/log/glusterfs on the<br>

&gt; &gt; node hosting the image.<br>

&gt; &gt; <br>

&gt; &gt; No pending heals (all servers report 0 entries to heal).<br>

&gt; &gt; <br>

&gt; &gt; The same VM behavior happens on all the physical nodes when I try to<br>

&gt; &gt; start with the same VM image.<br>

&gt; &gt; <br>

&gt; &gt; Something from the gluster fuse mount log from earlier shows:<br>

&gt; &gt; <br>

&gt; &gt; [2021-01-28 21:24:40.814227] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-adminvm-client-0: disconnected from adminvm-client-0. Client process will keep trying to connect to glusterd until brick&#39;s port is available<br>

&gt; &gt; [2021-01-28 21:24:43.815120] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-adminvm-client-0: changing port to 49152 (from 0)<br>

&gt; &gt; [2021-01-28 21:24:43.815833] I [MSGID: 114057] [client-handshake.c:1376:select_server_supported_programs] 0-adminvm-client-0: Using Program GlusterFS 4.x v1, Num (1298437), Version (400)<br>

&gt; &gt; [2021-01-28 21:24:43.817682] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-adminvm-client-0: Connected to adminvm-client-0, attached to remote volume &#39;/data/brick_adminvm&#39;.<br>

&gt; &gt; [2021-01-28 21:24:43.817709] I [MSGID: 114042] [client-handshake.c:930:client_post_handshake] 0-adminvm-client-0: 1 fds open - Delaying child_up until they are re-opened<br>

&gt; &gt; [2021-01-28 21:24:43.895163] I [MSGID: 114041] [client-handshake.c:318:client_child_up_reopen_done] 0-adminvm-client-0: last fd open&#39;d/lock-self-heal&#39;d - notifying CHILD-UP<br>

&gt; &gt; The message &quot;W [MSGID: 114061] [client-common.c:2893:client_pre_lk_v2] 0-adminvm-client-0:  (94695bdb-06b4-4105-9bc8-b8207270c941) remote_fd is -1. EBADFD [File descriptor in bad state]&quot; repeated 6 times between [2021-01-28 21:23:54.395811] and [2021-01-28 21:23:54.811640]<br>

&gt; &gt; <br>

&gt; &gt; <br>

&gt; &gt; But that was a long time ago.<br>

&gt; &gt; <br>

&gt; &gt; Brick logs have an entry from when I first started the vm today (the<br>

&gt; &gt; problem was reproduced yesterday) all brick logs have something similar.<br>

&gt; &gt; Nothing appeared on the several other startup attempts of the VM:<br>

&gt; &gt; <br>

&gt; &gt; [2021-01-28 21:24:45.460147] I [MSGID: 115029] [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client from CTX_ID:613f0d91-34e6-4495-859f-bca1c9f7af01-GRAPH_ID:0-PID:6287-HOST:nano-1-PC_NAME:adminvm-client-2-RECON_NO:-0 (version: 7.2) with subvol /data/brick_adminvm<br>

&gt; &gt; [2021-01-29 18:54:45.455558] I [addr.c:54:compare_addr_and_update] 0-/data/brick_adminvm: allowed = &quot;*&quot;, received addr = &quot;172.23.255.153&quot;<br>

&gt; &gt; [2021-01-29 18:54:45.455802] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 3b66cfab-00d5-4b13-a103-93b4cf95e144<br>

&gt; &gt; [2021-01-29 18:54:45.455815] I [MSGID: 115029] [server-handshake.c:549:server_setvolume] 0-adminvm-server: accepted client from CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0 (version: 7.2) with subvol /data/brick_adminvm<br>

&gt; &gt; [2021-01-29 18:54:45.494950] W [socket.c:774:__socket_rwv] 0-tcp.adminvm-server: readv on <a href="http://172.23.255.153:48551" rel="noreferrer" target="_blank">172.23.255.153:48551</a> failed (No data available)<br>

&gt; &gt; [2021-01-29 18:54:45.494994] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-adminvm-server: disconnecting connection from CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0<br>

&gt; &gt; [2021-01-29 18:54:45.495091] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-adminvm-server: Shutting down connection CTX_ID:3774af6b-07b9-437b-a34e-9f71f3b57d03-GRAPH_ID:0-PID:45640-HOST:nano-3-PC_NAME:adminvm-client-2-RECON_NO:-0<br>

&gt; &gt; <br>

&gt; &gt; <br>

&gt; &gt; <br>

&gt; &gt; Like before, if I halt the VM, kpartx the image, mount the giant root<br>

&gt; &gt; within the image, then unmount, unkpartx, and start the VM - it works:<br>

&gt; &gt; <br>

&gt; &gt; nano-2:/var/log/glusterfs # kpartx -a /adminvm/images/adminvm.img<br>

&gt; &gt; nano-2:/var/log/glusterfs # mount /dev/mapper/loop0p31 /mnt<br>

&gt; &gt; nano-2:/var/log/glusterfs # dmesg|tail -3<br>

&gt; &gt; [85528.602570] loop: module loaded<br>

&gt; &gt; [85535.975623] EXT4-fs (dm-3): recovery complete<br>

&gt; &gt; [85535.979663] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: (null)<br>

&gt; &gt; nano-2:/var/log/glusterfs # umount /mnt<br>

&gt; &gt; nano-2:/var/log/glusterfs # kpartx -d /adminvm/images/adminvm.img<br>

&gt; &gt; loop deleted : /dev/loop0<br>

&gt; &gt; <br>

&gt; &gt; VM WORKS for ONE boot cycle on one physical!<br>

&gt; &gt; <br>

&gt; &gt; nano-2:/var/log/glusterfs # virsh start adminvm<br>

&gt; &gt; <br>

&gt; &gt; However, this will work for a boot but later it will stop working again.<br>

&gt; &gt; (INCLUDING the physical node that booted once ok. The next boot fails<br>

&gt; &gt; again as does luanching it on the other two).<br>

&gt; &gt; <br>

&gt; &gt; Based on feedback, I will not change the shard size at this time and<br>

&gt; &gt; will leave that for later. Some people suggest larger sizes but it isn&#39;t<br>

&gt; &gt; a universal suggestion. I&#39;ll also not attempt to make a logical volume<br>

&gt; &gt; out of a group of smaller images as I think it should work like this.<br>

&gt; &gt; Those are things I will try later if I run out of runway. Since we want<br>

&gt; &gt; a solution to deploy to sites, this would increase the maintenance of<br>

&gt; &gt; the otherwise simple solution.<br>

&gt; &gt; <br>

&gt; &gt; I am leaving the state like this and will now proceed to update to the<br>

&gt; &gt; latest gluster 7.<br>

&gt; &gt; <br>

&gt; &gt; I will report back after I get everything updated and services restarted<br>

&gt; &gt; with the newer version.<br>

&gt; &gt; <br>

&gt; &gt; THANKS FOR ALL THE HELP SO FAR!!<br>

&gt; &gt; <br>

&gt; &gt; Erik<br>

&gt; &gt; <br>

&gt; &gt; On Wed, Jan 27, 2021 at 10:55:50PM +0300, Mahdi Adnan wrote:<br>

&gt; &gt; &gt;  I would leave it on 64M in volumes with spindle disks, but with SSD volumes, I<br>

&gt; &gt; &gt; would increase it to 128M or even 256M, but it varies from one workload to<br>

&gt; &gt; &gt; another.<br>

&gt; &gt; &gt; On Wed, Jan 27, 2021 at 10:02 PM Erik Jacobson &lt;<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>&gt; wrote:<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     &gt; Also, I would like to point that I have VMs with large disks 1TB and 2TB,<br>

&gt; &gt; &gt;     and<br>

&gt; &gt; &gt;     &gt; have no issues. definitely would upgrade Gluster version like let&#39;s say<br>

&gt; &gt; &gt;     at<br>

&gt; &gt; &gt;     &gt; least 7.9.<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     Great! Thank you! We can update but it&#39;s very sensitive due to the<br>

&gt; &gt; &gt;     workload. I can&#39;t officially update our gluster until we have a cluster<br>

&gt; &gt; &gt;     with a couple thousand nodes to test with. However, for this problem,<br>

&gt; &gt; &gt;     this is on my list on the test machine. I&#39;m hoping I can reproduce it. So<br>

&gt; &gt; &gt;     far<br>

&gt; &gt; &gt;     no luck making it happen again. Once I hit it, I will try to collect more<br>

&gt; &gt; &gt;     data<br>

&gt; &gt; &gt;     and at the end update gluster.<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     What do you think about the suggestion to increase the shard size? Are<br>

&gt; &gt; &gt;     you using the default size on your 1TB and 2TB images?<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     &gt; Amar also asked a question regarding enabling Sharding in the volume<br>

&gt; &gt; &gt;     after<br>

&gt; &gt; &gt;     &gt; creating the VMs disks, which would certainly mess up the volume if that<br>

&gt; &gt; &gt;     what<br>

&gt; &gt; &gt;     &gt; happened.<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     Oh I missed this question. I basically scripted it quick since I was<br>

&gt; &gt; &gt;     doing it so often.. I have a similar script that takes it away to start<br>

&gt; &gt; &gt;     over.<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     set -x<br>

&gt; &gt; &gt;     pdsh -g gluster mkdir /data/brick_adminvm/<br>

&gt; &gt; &gt;     gluster volume create adminvm replica 3 transport tcp 172.23.255.151:/data/<br>

&gt; &gt; &gt;     brick_adminvm 172.23.255.152:/data/brick_adminvm 172.23.255.153:/data/<br>

&gt; &gt; &gt;     brick_adminvm<br>

&gt; &gt; &gt;     gluster volume set adminvm group virt<br>

&gt; &gt; &gt;     gluster volume set adminvm granular-entry-heal enable<br>

&gt; &gt; &gt;     gluster volume set adminvm storage.owner-uid 439<br>

&gt; &gt; &gt;     gluster volume set adminvm storage.owner-gid 443<br>

&gt; &gt; &gt;     gluster volume start adminvm<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     pdsh -g gluster mount /adminvm<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     echo -n &quot;press enter to continue for restore tarball&quot;<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     pushd /adminvm<br>

&gt; &gt; &gt;     tar xvf /root/backup.tar<br>

&gt; &gt; &gt;     popd<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     echo -n &quot;press enter to continue for qemu-img&quot;<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     pushd /adminvm<br>

&gt; &gt; &gt;     qemu-img create -f raw -o preallocation=falloc /adminvm/images/adminvm.img<br>

&gt; &gt; &gt;     5T<br>

&gt; &gt; &gt;     popd<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     Thanks again for the kind responses,<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     Erik<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt; On Wed, Jan 27, 2021 at 5:28 PM Erik Jacobson &lt;<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>&gt;<br>

&gt; &gt; &gt;     wrote:<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;     &gt; &gt; Shortly after the sharded volume is made, there are some fuse<br>

&gt; &gt; &gt;     mount<br>

&gt; &gt; &gt;     &gt;     &gt; &gt; messages. I&#39;m not 100% sure if this was just before or during the<br>

&gt; &gt; &gt;     &gt;     &gt; &gt; big qemu-img command to make the 5T image<br>

&gt; &gt; &gt;     &gt;     &gt; &gt; (qemu-img create -f raw -o preallocation=falloc<br>

&gt; &gt; &gt;     &gt;     &gt; &gt; /adminvm/images/adminvm.img 5T)<br>

&gt; &gt; &gt;     &gt;     &gt; Any reason to have a single disk with this size ?<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;     &gt; Usually in any<br>

&gt; &gt; &gt;     &gt;     &gt; virtualization I have used , it is always recommended to keep it<br>

&gt; &gt; &gt;     lower.<br>

&gt; &gt; &gt;     &gt;     &gt; Have you thought about multiple disks with smaller size ?<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;     Yes, because the actual virtual machine is an admin node/head node<br>

&gt; &gt; &gt;     cluster<br>

&gt; &gt; &gt;     &gt;     manager for a supercomputer that hosts big OS images and drives<br>

&gt; &gt; &gt;     &gt;     multi-thousand-node-clusters (boot, monitoring, image creation,<br>

&gt; &gt; &gt;     &gt;     distribution, sometimes NFS roots, etc) . So this VM is a biggie.<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;     We could make multiple smaller images but it would be very painful<br>

&gt; &gt; &gt;     since<br>

&gt; &gt; &gt;     &gt;     it differs from the normal non-VM setup.<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;     So unlike many solutions where you have lots of small VMs with their<br>

&gt; &gt; &gt;     &gt;     images small images, this solution is one giant VM with one giant<br>

&gt; &gt; &gt;     image.<br>

&gt; &gt; &gt;     &gt;     We&#39;re essentially using gluster in this use case (as opposed to<br>

&gt; &gt; &gt;     others I<br>

&gt; &gt; &gt;     &gt;     have posted about in the past) for head node failover (combined with<br>

&gt; &gt; &gt;     &gt;     pacemaker).<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;     &gt; Also worth<br>

&gt; &gt; &gt;     &gt;     &gt; noting is that RHII is supported only when the shard size is <br>

&gt; &gt; &gt;     512MB, so<br>

&gt; &gt; &gt;     &gt;     &gt; it&#39;s worth trying bigger shard size .<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;     I have put larger shard size and newer gluster version on the list to<br>

&gt; &gt; &gt;     &gt;     try. Thank you! Hoping to get it failing again to try these things!<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt;<br>

&gt; &gt; &gt;     &gt; --<br>

&gt; &gt; &gt;     &gt; Respectfully<br>

&gt; &gt; &gt;     &gt; Mahdi<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; --<br>

&gt; &gt; &gt; Respectfully<br>

&gt; &gt; &gt; Mahdi<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Respectfully<div>Mahdi</div></div></div>