<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Olaf,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks so much for sharing this, it's hugely helpful, if only to make me feel less like I'm going crazy. I'll see if theres anything I can add to the bug report. I'm trying to develop a test to reproduce the issue now.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
We're running this in a sort of interactive HPC environment, so these error are a bit hard for us to systematically handle, and they have a tendency to be quite disruptive to folks work.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I've run into other issues with sharding as well, such as this: <a href="https://lists.gluster.org/pipermail/gluster-users/2019-October/037241.html" id="LPlnk403358">
https://lists.gluster.org/pipermail/gluster-users/2019-October/037241.html</a><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I'm wondering then, if maybe sharding isn't quite stable yet and it's more sensible for me to just disable this feature for now? I'm not quite sure what other implications that might have but so far all the issues I've run into so far as a new gluster user
seem like they're related to shards.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Tim<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Olaf Buitelaar <olaf.buitelaar@gmail.com><br>
<b>Sent:</b> Wednesday, November 27, 2019 9:50 AM<br>
<b>To:</b> Timothy Orme <torme@ancestry.com><br>
<b>Cc:</b> gluster-users <gluster-users@gluster.org><br>
<b>Subject:</b> [EXTERNAL] Re: [Gluster-users] Stale File Handle Errors During Heavy Writes</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi Tim,
<div><br>
</div>
<div>i've been suffering from this also for a long time, not sure if it's exact the same situation since your setup is different. But it seems similar.</div>
<div>i've filed this bug report; <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1732961&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=GbJiS8pLGORzLwdgt0ypnnQxQgRhrTHdGXEizatE9g0&e=">https://bugzilla.redhat.com/show_bug.cgi?id=1732961</a> for
which you might be able to enrich.</div>
<div>To solve the stale files i've made this bash script; <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_olafbuitelaar_ff6fe9d4ab39696d9ad6ca689cc89986&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=CvN0yMFI03czcHgzTeexTfP9h4woiAO_XVyn1umHR8g&e=">https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986</a> (it's
slightly outdated) which you could use as inspiration, it basically removes the stale files as suggested here; <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_pipermail_gluster-2Dusers_2018-2DMarch_033785.html&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=MGGOwcqFQ8DwBK3MDoMxO-MD6_wrmojY1T9GYqE8WOs&e=">https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html</a> .
Please be aware the script won't work if you have 2 (or more) bricks of the same volume on the same server (since it always takes the first path found).</div>
<div>I invoke the script via ansible like this (since the script needs to run on all bricks);</div>
<div>- hosts: host1,host2,host3<br>
tasks:<br>
- shell: 'bash /root/clean-stale-gluster-fh.sh --host="{{ intif.ip | first }}" --volume=ovirt-data --backup="/backup/stale/gfs/ovirt-data" --shard="{{ item }}" --force'<br>
with_items:<br>
- 1b0ba5c2-dd2b-45d0-9c4b-a39b2123cc13.14451<br>
</div>
<div><br>
</div>
<div>fortunately for me the issue seems to be disappeared, since it's now about 1 month i received one, while before it was about every other day. </div>
<div>The biggest thing the seemed to resolve it was more disk space. while before there was also plenty the gluster volume was at about 85% full, and the individual disk had about 20-30% free of 8TB disk array, but had servers in the mix with smaller disk array's
but with similar available space (in percents). i'm now at much lower percentage. </div>
<div>So my latest running theory is that it has something todo with how gluster allocates the shared's, since it's based on it's hash it might want to place it in a certain sub-volume, but than comes to the conclusion it has not enough space there, writes a
marker to redirect it to another sub-volume (thinking this is the stale file). However rebalances don't fix this issue. Also this still doesn't seem explain that most stale files always end up in the first sub-volume.</div>
<div>Unfortunate i've no proof this is actually the root cause, besides that the symptom "disappeared" once gluster had more space to work with.</div>
<div><br>
</div>
<div>Best Olaf</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">Op wo 27 nov. 2019 om 02:38 schreef Timothy Orme <<a href="mailto:torme@ancestry.com" target="_blank">torme@ancestry.com</a>>:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Hi All,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
I'm running a 3x2 cluster, v6.5. Not sure if its relevant, but also have sharding enabled.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
I've found that when under heavy write load, clients start erroring out with "stale file handle" errors, on files not related to the writes.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
For instance, when a user is running a simple wc against a file, it will bail during that operation with "stale file"<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
When I check the client logs, I see errors like:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<span>[2019-11-26 22:41:33.565776] E [MSGID: 109040] [dht-helper.c:1336:dht_migration_complete_check_task] 3-scratch-dht: 24d53a0e-c28d-41e0-9dbc-a75e823a3c7d: failed to lookup the file on scratch-dht
</span>[Stale file handle]<br>
<span></span>
<div>[2019-11-26 22:41:33.565853] W [fuse-bridge.c:2827:fuse_readv_cbk] 0-glusterfs-fuse: 33112038: READ => -1 gfid=147040e2-a6b8-4f54-8490-f0f3df29ee50 fd=0x7f95d8d0b3f8 (Stale file handle)<br>
</div>
<span></span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
I've seen some bugs or other threads referencing similar issues, but couldn't really discern a solution from them.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Is this caused by some consistency issue with metadata while under load or something else? I dont see the issue when heavy reads are occurrring.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Any help is greatly appreciated!<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Thanks!</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Tim<br>
</div>
</div>
________<br>
<br>
Community Meeting Calendar:<br>
<br>
APAC Schedule -<br>
Every 2nd and 4th Tuesday at 11:30 AM IST<br>
Bridge: <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=JHDxrPUb-16_6j_D-rhVhXtDR9h4OwPyylW4ScTmygE&e=" rel="noreferrer" target="_blank">
https://bluejeans.com/441850968</a><br>
<br>
NA/EMEA Schedule -<br>
Every 1st and 3rd Tuesday at 01:00 PM EDT<br>
Bridge: <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=JHDxrPUb-16_6j_D-rhVhXtDR9h4OwPyylW4ScTmygE&e=" rel="noreferrer" target="_blank">
https://bluejeans.com/441850968</a><br>
<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=gPJBHZbzGbDnozrJuLTslUXJdPrLDrR2rT86P1uUuPk&e=" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote>
</div>
</div>
</div>
</body>
</html>