<div dir="ltr">Hi Tim,<div><br></div><div>i've been suffering from this also for a long time, not sure if it's exact the same situation since your setup is different. But it seems similar.</div><div>i've filed this bug report; <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1732961">https://bugzilla.redhat.com/show_bug.cgi?id=1732961</a> for which you might be able to enrich.</div><div>To solve the stale files i've made this bash script; <a href="https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986">https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986</a> (it's slightly outdated) which you could use as inspiration, it basically removes the stale files as suggested here; <a href="https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html">https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html</a> . Please be aware the script won't work if you have 2 (or more) bricks of the same volume on the same server (since it always takes the first path found).</div><div>I invoke the script via ansible like this (since the script needs to run on all bricks);</div><div>- hosts: host1,host2,host3<br> tasks:<br> - shell: 'bash /root/clean-stale-gluster-fh.sh --host="{{ intif.ip | first }}" --volume=ovirt-data --backup="/backup/stale/gfs/ovirt-data" --shard="{{ item }}" --force'<br> with_items:<br> - 1b0ba5c2-dd2b-45d0-9c4b-a39b2123cc13.14451<br></div><div><br></div><div>fortunately for me the issue seems to be disappeared, since it's now about 1 month i received one, while before it was about every other day. </div><div>The biggest thing the seemed to resolve it was more disk space. while before there was also plenty the gluster volume was at about 85% full, and the individual disk had about 20-30% free of 8TB disk array, but had servers in the mix with smaller disk array's but with similar available space (in percents). i'm now at much lower percentage. </div><div>So my latest running theory is that it has something todo with how gluster allocates the shared's, since it's based on it's hash it might want to place it in a certain sub-volume, but than comes to the conclusion it has not enough space there, writes a marker to redirect it to another sub-volume (thinking this is the stale file). However rebalances don't fix this issue. Also this still doesn't seem explain that most stale files always end up in the first sub-volume.</div><div>Unfortunate i've no proof this is actually the root cause, besides that the symptom "disappeared" once gluster had more space to work with.</div><div><br></div><div>Best Olaf</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Op wo 27 nov. 2019 om 02:38 schreef Timothy Orme <<a href="mailto:torme@ancestry.com" target="_blank">torme@ancestry.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi All,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I'm running a 3x2 cluster, v6.5. Not sure if its relevant, but also have sharding enabled.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I've found that when under heavy write load, clients start erroring out with "stale file handle" errors, on files not related to the writes.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
For instance, when a user is running a simple wc against a file, it will bail during that operation with "stale file"<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
When I check the client logs, I see errors like:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span>[2019-11-26 22:41:33.565776] E [MSGID: 109040] [dht-helper.c:1336:dht_migration_complete_check_task] 3-scratch-dht: 24d53a0e-c28d-41e0-9dbc-a75e823a3c7d: failed to lookup the file on scratch-dht
</span>[Stale file handle]<br>
<span></span>
<div>[2019-11-26 22:41:33.565853] W [fuse-bridge.c:2827:fuse_readv_cbk] 0-glusterfs-fuse: 33112038: READ => -1 gfid=147040e2-a6b8-4f54-8490-f0f3df29ee50 fd=0x7f95d8d0b3f8 (Stale file handle)<br>
</div>
<span></span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I've seen some bugs or other threads referencing similar issues, but couldn't really discern a solution from them.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Is this caused by some consistency issue with metadata while under load or something else? I dont see the issue when heavy reads are occurrring.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Any help is greatly appreciated!<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks!</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Tim<br>
</div>
</div>
________<br>
<br>
Community Meeting Calendar:<br>
<br>
APAC Schedule -<br>
Every 2nd and 4th Tuesday at 11:30 AM IST<br>
Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
<br>
NA/EMEA Schedule -<br>
Every 1st and 3rd Tuesday at 01:00 PM EDT<br>
Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div></div>