<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Dear team. I made a new PR (sorry some experience showing in github.com I created a new PR instead of updating the old one. Seemed easier to close the old one and use the new one than fix the old one).
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">In the new PR, I integrated feedback. Thank you so much.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><a href="https://github.com/gluster/glusterfs/pull/4322">https://github.com/gluster/glusterfs/pull/4322</a><br>
<br>
I am attaching to this email my notes on reproducing this environment. I used virtual machines and a constrained test environment to duplicate the problem and test the fix. I hope these notes resolve all the outstanding questions.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">If not, please let me know! Thanks again to all.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Erik<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="color:black">From:
</span></b><span style="color:black">Jacobson, Erik <erik.jacobson@hpe.com><br>
<b>Date: </b>Monday, March 18, 2024 at 10:22</span><span style="font-family:"Arial",sans-serif;color:black"> </span><span style="color:black">AM<br>
<b>To: </b>Aravinda <aravinda@kadalu.tech><br>
<b>Cc: </b>Gluster Devel <gluster-devel@gluster.org><br>
<b>Subject: </b>Re: [Gluster-devel] Gluster 9.6 changes to fix gluster NFS bug<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">I will need to set up a test case that is isolated.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">In the meantime, I did a fork and a PR. I marked it as draft as I try to find an easier test case.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><a href="https://github.com/gluster/glusterfs/pull/4319">https://github.com/gluster/glusterfs/pull/4319</a></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<div id="mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="color:black">From:
</span></b><span style="color:black">Aravinda <aravinda@kadalu.tech><br>
<b>Date: </b>Saturday, March 16, 2024 at 9:37</span><span style="font-family:"Arial",sans-serif;color:black"> </span><span style="color:black">AM<br>
<b>To: </b>Jacobson, Erik <erik.jacobson@hpe.com><br>
<b>Cc: </b>Gluster Devel <gluster-devel@gluster.org><br>
<b>Subject: </b>Re: [Gluster-devel] Gluster 9.6 changes to fix gluster NFS bug</span><o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> We ran into some trouble in Gluster 9.3 with the Gluster NFS server. We updated to a supported Gluster 9.6 and reproduced the problem.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Please share the reproducer steps. We can include in our tests if possible.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> We understand the Gluster team recommends the use of Ganesha for NFS but in our specific environment and use case, Ganesha isn’t fast enough. No disrespect intended; we never
got the chance to work with the Ganesha team on it.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">That is totally fine. I think gnfs is disabled in the later versions, you have to build from source to enable it. Only issue I see is gnfs doesn't support NFS v4 and the NFS+Gluster
team shifted the focus to NFS Ganesha.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> We tried to avoid Ganesha and Gluster NFS altogether, using kernel NFS with fuse mounts exported, and that was faster, but failover didn’t work. We could make the mount point
highly available but not open files (so when the IP failover happened, the mount point would still function but the open file – a squashfs in this example – would not fail over).</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Was Gluster backup volfile server option used or any other method used for high availability?</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> So we embarked on a mission to try to figure out what was going on with the NFS server. I am not an expert in network code or distributed filesystems. So, someone with a
careful eye would need to check these changes out. However, what I generally found was that the Gluster NFS server requires the layers of gluster to report back ‘errno’ to determine if EINVAL is set (to determine is_eof). In some instances, errno was not being
passed down the chain or was being reset to 0. This resulted in NFS traces showing multiple READs for a 1 byte file and the NFS client showing an “I/O” error. It seemed like files above 170M worked ok. This is likely due to how the layers of gluster change
with changing and certain file sizes. However, we did not track this part down.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> We found in one case disabling the NFS performance IO cache would fix the problem for a non-sharded volume, but the problem persisted in a sharded volume. Testing found our
environment takes the disabling of the NFS performance IO cache quite hard anyway, so it wasn’t an option for us.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> We were curious why the fuse client wouldn’t be impacted but our quick look found that fuse doesn’t really use or need errno in the same way Gluster NFS does.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> So, the attached patch fixed the issue. Accessing small files in either case above now work properly. We tried running md5sum against large files over NFS and fuse mounts
and everything seemed fine.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> In our environment, the NFS-exported directories tend to contain squashfs files representing read-only root filesystems for compute nodes, and those worked fine over NFS
after the change as well.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">> If you do not wish to include this patch because Gluster NFS is deprecated, I would greatly appreciate it if someone could validate my work as our solution will need Gluster
NFS enabled for the time being. I am concerned I could have missed a nuance and caused a hard to detect problem.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">We can surely include this patch in Gluster repo since many tests are still using this feature and it is available for interested users. Thanks for the PR. Please submit the
PR to Github repo, I will followup with the maintainers and update. Let me know if you need any help to submit the PR.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">--</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Thanks and Regards</span><o:p></o:p></p>
</div>
<div id="Zm-_Id_-Sgn">
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Aravinda</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Kadalu Technologies</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div style="border:none;border-top:solid #CCCCCC 1.0pt;padding:0in 0in 0in 0in;margin-top:7.5pt;margin-bottom:7.5pt">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<div id="Zm-_Id_-Sgn1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">---- On Thu, 14 Mar 2024 01:32:50 +0530
<b>Jacobson, Erik <erik.jacobson@hpe.com></b> wrote ---</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt" id="blockquote_zmail">
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Hello team.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">We ran into some trouble in Gluster 9.3 with the Gluster NFS server. We updated to a supported Gluster 9.6 and reproduced the problem.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">We understand the Gluster team recommends the use of Ganesha for NFS but in our specific environment and use case, Ganesha isn’t fast enough. No disrespect intended; we never
got the chance to work with the Ganesha team on it.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">We tried to avoid Ganesha and Gluster NFS altogether, using kernel NFS with fuse mounts exported, and that was faster, but failover didn’t work. We could make the mount point
highly available but not open files (so when the IP failover happened, the mount point would still function but the open file – a squashfs in this example – would not fail over).</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">So we embarked on a mission to try to figure out what was going on with the NFS server. I am not an expert in network code or distributed filesystems. So, someone with a careful
eye would need to check these changes out. However, what I generally found was that the Gluster NFS server requires the layers of gluster to report back ‘errno’ to determine if EINVAL is set (to determine is_eof). In some instances, errno was not being passed
down the chain or was being reset to 0. This resulted in NFS traces showing multiple READs for a 1 byte file and the NFS client showing an “I/O” error. It seemed like files above 170M worked ok. This is likely due to how the layers of gluster change with changing
and certain file sizes. However, we did not track this part down.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">We found in one case disabling the NFS performance IO cache would fix the problem for a non-sharded volume, but the problem persisted in a sharded volume. Testing found our
environment takes the disabling of the NFS performance IO cache quite hard anyway, so it wasn’t an option for us.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">We were curious why the fuse client wouldn’t be impacted but our quick look found that fuse doesn’t really use or need errno in the same way Gluster NFS does.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">So, the attached patch fixed the issue. Accessing small files in either case above now work properly. We tried running md5sum against large files over NFS and fuse mounts and
everything seemed fine.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">In our environment, the NFS-exported directories tend to contain squashfs files representing read-only root filesystems for compute nodes, and those worked fine over NFS after
the change as well.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">If you do not wish to include this patch because Gluster NFS is deprecated, I would greatly appreciate it if someone could validate my work as our solution will need Gluster
NFS enabled for the time being. I am concerned I could have missed a nuance and caused a hard to detect problem.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">Thank you all!</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">patch.txt attached.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif">-------<br>
<br>
Community Meeting Calendar: <br>
Schedule - <br>
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC <br>
Bridge: <a href="https://meet.google.com/cpu-eiue-hvk">https://meet.google.com/cpu-eiue-hvk</a> <br>
<br>
Gluster-devel mailing list <br>
Gluster-devel@gluster.org <br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-devel">https://lists.gluster.org/mailman/listinfo/gluster-devel</a>
</span><o:p></o:p></p>
</blockquote>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana",sans-serif"> </span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>