[Gluster-users] Stale File Handle Errors During Heavy Writes

Olaf Buitelaar olaf.buitelaar at gmail.com
Thu Nov 28 08:40:33 UTC 2019


Yeah..zo the right procedure should be to setup a new volume without
sharding and copy everything over.

On Thu, 28 Nov 2019, 06:45 Strahil, <hunter86_bg at yahoo.com> wrote:

> I have already tried disabling sharding on a test oVirt volume... The
> results were devastating for the app, so please do not disable sharding.
>
> Best Regards,
> Strahil Nikolov
> On Nov 27, 2019 20:55, Olaf Buitelaar <olaf.buitelaar at gmail.com> wrote:
>
> Hi Tim,
>
> That issue also seems to point to a stale file. Best i suppose is first to
> determine if you indeed have the same shard on different sub-volumes, where
> on one of the sub-volumes the file size is 0KB and has the stick bit set.
> if so we suffer from the same issue, and you can clean those files up, so
> the `rm` command should start working again.
> Essentially you should consider the volume unhealty until you have
> resolved the stale files, before you can continue file operations.
> Remounting the client shouldn't make a difference since the issue is at
> brick/sub-volume level.
>
> the last comment i received from Krutika;
> "I haven't had the chance to look into the attachments yet. I got another
> customer case on me.
> But from the description, it seems like the linkto file (the one with a
> 'T') and the original file don't have the same gfid?
> It's not wrong for those 'T' files to exist. But they're supposed to have
> the same gfid.
> This is something that needs DHT team's attention.
> Do you mind raising a bug in bugzilla.redhat.com against glusterfs and
> component 'distribute' or 'DHT'?"
>
>
> For me replicating it was easiest with running xfs_fsr (which is very
> write intensive in fragmented volumes) from within a VM, but it could
> happen with a simple yum install.. docker run (with new image)..general
> test with dd, mkfs.xfs or just random, which was the normal case. But i've
> to say my workload is mostly write intensive, like yours.
>
> Sharding in general is a nice feature, it allows your files to be broken
> up into peaces, which is also it's biggest danger..if anything goes
> haywire, it's currently practically impossible to stitch all those peaces
> together again, since no tool for this seems to exists..which is the nice
> thing about none-sharded volumes, they are just files..but if you really
> wanted i suppose it could be done. But would be very painful..i suppose.
> With the files being in shard's it allows  for much more equal
> distribution. Also heals seem to resolve much quicker.
> I'm also running none sharded volumes, with files of 100GB+ and those
> heals can take significantly longer. And those none sharded volumes i also
> sometime's have issues with..however not remembering any stale files.
> But if you don't need it you might be better of disabling it. However i
> believe you're never allowed to turn of sharding on a sharded volumes since
> it will corrupt your data.
>
> Best Olaf
>
> Op wo 27 nov. 2019 om 19:19 schreef Timothy Orme <torme at ancestry.com>:
>
> Hi Olaf,
>
> Thanks so much for sharing this, it's hugely helpful, if only to make me
> feel less like I'm going crazy.  I'll see if theres anything I can add to
> the bug report.  I'm trying to develop a test to reproduce the issue now.
>
> We're running this in a sort of interactive HPC environment, so these
> error are a bit hard for us to systematically handle, and they have a
> tendency to be quite disruptive to folks work.
>
> I've run into other issues with sharding as well, such as this:
> <https://lists.gluster.org/pipermail/gluster-users/2019-October/037241.html>
> https://lists.gluster.org/pipermail/gluster-users/2019-October/037241.html
>
> I'm wondering then, if maybe sharding isn't quite stable yet and it's more
> sensible for me to just disable this feature for now?  I'm not quite sure
> what other implications that might have but so far all the issues I've run
> into so far as a new gluster user seem like they're related to shards.
>
> Thanks,
> Tim
> ------------------------------
> *From:* Olaf Buitelaar <olaf.buitelaar at gmail.com>
> *Sent:* Wednesday, November 27, 2019 9:50 AM
> *To:* Timothy Orme <torme at ancestry.com>
> *Cc:* gluster-users <gluster-users at gluster.org>
> *Subject:* [EXTERNAL] Re: [Gluster-users] Stale File Handle Errors During
> Heavy Writes
>
> Hi Tim,
>
> i've been suffering from this also for a long time, not sure if it's exact
> the same situation since your setup is different. But it seems similar.
> i've filed this bug report;
> https://bugzilla.redhat.com/show_bug.cgi?id=1732961
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1732961&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=GbJiS8pLGORzLwdgt0ypnnQxQgRhrTHdGXEizatE9g0&e=> for
> which you might be able to enrich.
> To solve the stale files i've made this bash script;
> https://gist.github.com/olafbuitelaar/ff6fe9d4ab39696d9ad6ca689cc89986
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_olafbuitelaar_ff6fe9d4ab39696d9ad6ca689cc89986&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=CvN0yMFI03czcHgzTeexTfP9h4woiAO_XVyn1umHR8g&e=> (it's
> slightly outdated) which you could use as inspiration, it basically removes
> the stale files as suggested here;
> https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_pipermail_gluster-2Dusers_2018-2DMarch_033785.html&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=MGGOwcqFQ8DwBK3MDoMxO-MD6_wrmojY1T9GYqE8WOs&e=> .
> Please be aware the script won't work if you have  2 (or more) bricks of
> the same volume on the same server (since it always takes the first path
> found).
> I invoke the script via ansible like this (since the script needs to run
> on all bricks);
> - hosts: host1,host2,host3
>   tasks:
>     - shell: 'bash /root/clean-stale-gluster-fh.sh --host="{{ intif.ip |
> first }}" --volume=ovirt-data --backup="/backup/stale/gfs/ovirt-data"
> --shard="{{ item }}" --force'
>       with_items:
>         - 1b0ba5c2-dd2b-45d0-9c4b-a39b2123cc13.14451
>
> fortunately for me the issue seems to be disappeared, since it's now about
> 1 month i received one, while before it was about every other day.
> The biggest thing the seemed to resolve it was more disk space. while
> before there was also plenty the gluster volume was at about 85% full, and
> the individual disk had about 20-30% free of 8TB disk array, but had
> servers in the mix with smaller disk array's but with similar available
> space (in percents). i'm now at much lower percentage.
> So my latest running theory is that it has something todo with how gluster
> allocates the shared's, since it's based on it's hash it might want to
> place it in a certain sub-volume, but than comes to the conclusion it has
> not enough space there, writes a marker to redirect it to another
> sub-volume (thinking this is the stale file). However rebalances don't fix
> this issue.  Also this still doesn't seem explain that most stale files
> always end up in the first sub-volume.
> Unfortunate i've no proof this is actually the root cause, besides that
> the symptom "disappeared" once gluster had more space to work with.
>
> Best Olaf
>
> Op wo 27 nov. 2019 om 02:38 schreef Timothy Orme <torme at ancestry.com>:
>
> Hi All,
>
> I'm running a 3x2 cluster, v6.5.  Not sure if its relevant, but also have
> sharding enabled.
>
> I've found that when under heavy write load, clients start erroring out
> with "stale file handle" errors, on files not related to the writes.
>
> For instance, when a user is running a simple wc against a file, it will
> bail during that operation with "stale file"
>
> When I check the client logs, I see errors like:
>
> [2019-11-26 22:41:33.565776] E [MSGID: 109040]
> [dht-helper.c:1336:dht_migration_complete_check_task] 3-scratch-dht:
> 24d53a0e-c28d-41e0-9dbc-a75e823a3c7d: failed to lookup the file on
> scratch-dht  [Stale file handle]
> [2019-11-26 22:41:33.565853] W [fuse-bridge.c:2827:fuse_readv_cbk]
> 0-glusterfs-fuse: 33112038: READ => -1
> gfid=147040e2-a6b8-4f54-8490-f0f3df29ee50 fd=0x7f95d8d0b3f8 (Stale file
> handle)
>
> I've seen some bugs or other threads referencing similar issues, but
> couldn't really discern a solution from them.
>
> Is this caused by some consistency issue with metadata while under load or
> something else?  I dont see the issue when heavy reads are occurrring.
>
> Any help is greatly appreciated!
>
> Thanks!
> Tim
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge:
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=JHDxrPUb-16_6j_D-rhVhXtDR9h4OwPyylW4ScTmygE&e=>
> https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge:
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__bluejeans.com_441850968&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=JHDxrPUb-16_6j_D-rhVhXtDR9h4OwPyylW4ScTmygE&e=>
> https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=kKqjBR9KKWaWpMhASkPbOg&r=d0SJB4ihnau-Oyws6GEzcipkV9DfxCuMbgdSRgXeuxM&m=Nh3Ca9VCh4XnpEF6imXwTa2NUUglz-XZQhfG8-AyOVU&s=gPJBHZbzGbDnozrJuLTslUXJdPrLDrR2rT86P1uUuPk&e=>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191128/de14102b/attachment.html>


More information about the Gluster-users mailing list