[Gluster-users] files missing, as they seem to be searched on the wrong subvolume
Strahil Nikolov
hunter86_bg at yahoo.com
Mon Sep 27 12:56:58 UTC 2021
Have you tried with 'performance.stat-prefetch' or 'performance.parallel-readdir' set to disabled ?
Best Regards,
Strahil Nikolov
В четвъртък, 23 септември 2021 г., 09:43:44 ч. Гринуич+3, Morus Walter <morus.walter.ml at googlemail.com> написа:
Hi,
we are in the process of switching from a setup with three subvolumes
(each having two bricks and an arbiter) into a setup with two (new, larger)
subvolumes, and ran into an issue, where gluster does not find a number
of files any more.
The files are there, when doing an ls, but cannot be accessed directly before
that. Once the ls is done, they are accessible for some time.
So far this seems to affect old files (files that have been there
before the volume changes started) only.
Our current understanding of the issue is, that the distributed hash
table translator assumes the file on the wrong subvolume and therefor
does not find it. The directory lookup has to look into all subvolumes
and therefor finds it. That result is somehow cached on the client and
makes the file accessible as well for some time..
I'm not fully aware of the order of changes made to the volume (it was
done by different people and one of them is on vacation ATM), but I think
it was something like
- we added a first new subvolume
- we started and commited the removal of the first two old subvolumes
For the removal gluster reported a number of errors and we did it
forcefully anyways, as we did not find further info on the errors and
how to get rid of them. Probably something we should not have done.
- we added the second new subvolume
- we started the removal of the last old subvolume
The last removal has not been commited yet.
glusterfs version is 9.3 on Ubuntu 18.04.6
We googled the issue, one thing we found was
https://github.com/gluster/glusterfs/issues/843
It suggests to try 'gluster volume set parallel-readdir disable' but
that did not change the situation (makes sense, we do not have issues
with readdir but file access).
We looked into the logs but did not find further insight.
Questions:
Is there any means how to fix that within gluster commands/tools?
Would another rebalance (beyond the one that happend from the removal)
likely fix the situation?
We figured out a way to fix the situation file by file by either
copying or hardlinking the files into new folders and removing the
old ones.
(for a folder foo, one would
* create a folder foo_new,
* hardlink all files from foo into foo_new
* rename foo into foo_old
* rename foo_new into foo
* delete foo_old)
Any help appreciated.
best
Morus
PS:
gluster volume info
Volume Name: webgate
Type: Distributed-Replicate
Volume ID: 383cf25e-f76c-4921-8d64-8bc41c908d57
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: budgie-brick1.arriwebgate.com:/data/gluster
Brick2: budgie-brick2.arriwebgate.com:/data/gluster
Brick3: budgie-arbiter.arriwebgate.com:/data/gluster (arbiter)
Brick4: parrot-brick1.arriwebgate.com:/data/gluster
Brick5: parrot-brick2.arriwebgate.com:/data/gluster
Brick6: parrot-arbiter.arriwebgate.com:/data/gluster (arbiter)
Brick7: kiwi-brick1.arriwebgate.com:/data/gluster
Brick8: kiwi-brick2.arriwebgate.com:/data/gluster
Brick9: kiwi-arbiter.arriwebgate.com:/data/gluster (arbiter)
Options Reconfigured:
performance.write-behind-window-size: 1MB
storage.fips-mode-rchecksum: on
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
cluster.rebal-throttle: lazy
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
performance.cache-size: 4GB
performance.io-thread-count: 16
performance.readdir-ahead: on
client.event-threads: 8
server.event-threads: 8
config.transport: tcp
performance.read-ahead: off
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 1000000
performance.parallel-readdir: on
storage.owner-uid: 1000
storage.owner-gid: 1000
cluster.background-self-heal-count: 64
cluster.shd-max-threads: 4
gluster volume heal webgate info
Brick budgie-brick1.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick budgie-brick2.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick budgie-arbiter.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick parrot-brick1.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick parrot-brick2.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick parrot-arbiter.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick kiwi-brick1.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick kiwi-brick2.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
Brick kiwi-arbiter.arriwebgate.com:/data/gluster
Status: Connected
Number of entries: 0
gluster volume status webgate
Status of volume: webgate
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick budgie-brick1.arriwebgate.com:/data/g
luster 49152 0 Y 854
Brick budgie-brick2.arriwebgate.com:/data/g
luster 49152 0 Y 888
Brick budgie-arbiter.arriwebgate.com:/data/
gluster 49152 0 Y 857
Brick parrot-brick1.arriwebgate.com:/data/g
luster 49152 0 Y 1889
Brick parrot-brick2.arriwebgate.com:/data/g
luster 49152 0 Y 2505
Brick parrot-arbiter.arriwebgate.com:/data/
gluster 49152 0 Y 1439
Brick kiwi-brick1.arriwebgate.com:/data/glu
ster 49152 0 Y 24941
Brick kiwi-brick2.arriwebgate.com:/data/glu
ster 49152 0 Y 31448
Brick kiwi-arbiter.arriwebgate.com:/data/gl
uster 49152 0 Y 5483
Self-heal Daemon on localhost N/A N/A Y 24700
Self-heal Daemon on budgie-brick1.arriwebga
te.com N/A N/A Y 960
Self-heal Daemon on budgie-brick2.arriwebga
te.com N/A N/A Y 974
Self-heal Daemon on parrot-brick1.arriwebga
te.com N/A N/A Y 1811
Self-heal Daemon on parrot-brick2.arriwebga
te.com N/A N/A Y 2543
Self-heal Daemon on kiwi-brick2.arriwebgate
.com N/A N/A Y 31207
Self-heal Daemon on kiwi-arbiter.arriwebgat
e.com N/A N/A Y 5229
Self-heal Daemon on parrot-arbiter.arriwebg
ate.com N/A N/A Y 1466
Self-heal Daemon on budgie-arbiter.arriwebg
ate.com N/A N/A Y 984
Task Status of Volume webgate
------------------------------------------------------------------------------
Task : Remove brick
ID : 6f67e1d4-23f4-46ca-a97a-2adb152ef294
Removed bricks:
budgie-brick1.arriwebgate.com:/data/gluster
budgie-brick2.arriwebgate.com:/data/gluster
budgie-arbiter.arriwebgate.com:/data/gluster
Status : completed
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list