<div dir="ltr">Hi,<br><br>we are in the process of switching from a setup with three subvolumes<br>(each having two bricks and an arbiter) into a setup with two (new, larger)<br>subvolumes, and ran into an issue, where gluster does not find a number<br>of files any more.<br>The files are there, when doing an ls, but cannot be accessed directly before<br>that. Once the ls is done, they are accessible for some time.<br><br>So far this seems to affect old files (files that have been there<br>before the volume changes started) only.<br><br>Our current understanding of the issue is, that the distributed hash<br>table translator assumes the file on the wrong subvolume and therefor<br>does not find it. The directory lookup has to look into all subvolumes<br>and therefor finds it. That result is somehow cached on the client and<br>makes the file accessible as well for some time..<br><br>I'm not fully aware of the order of changes made to the volume (it was<br>done by different people and one of them is on vacation ATM), but I think<br>it was something like<br>- we added a first new subvolume<br>- we started and commited the removal of the first two old subvolumes<br>  For the removal gluster reported a number of errors and we did it<br>  forcefully anyways, as we did not find further info on the errors and<br>  how to get rid of them. Probably something we should not have done.<br>- we added the second new subvolume<br>- we started the removal of the last old subvolume<br>The last removal has not been commited yet.<br><br>glusterfs version is 9.3 on Ubuntu 18.04.6<br><br>We googled the issue, one thing we found was<br><a href="https://github.com/gluster/glusterfs/issues/843">https://github.com/gluster/glusterfs/issues/843</a><br>It suggests to try 'gluster volume set parallel-readdir disable' but<br>that did not change the situation (makes sense, we do not have issues<br>with readdir but file access).<br><br>We looked into the logs but did not find further insight.<br><br>Questions:<br>Is there any means how to fix that within gluster commands/tools?<br>Would another rebalance (beyond the one that happend from the removal)<br>likely fix the situation?<br><br><br>We figured out a way to fix the situation file by file by either<br>copying or hardlinking the files into new folders and removing the<br>old ones.<br>(for a folder foo, one would<br> * create a folder foo_new,<br> * hardlink all files from foo into foo_new<br> * rename foo into foo_old<br> * rename foo_new into foo<br> * delete foo_old)<br><br>Any help appreciated.<br><br>best<br>  Morus<br><br>PS:<br><br>gluster volume info<br><br>Volume Name: webgate<br>Type: Distributed-Replicate<br>Volume ID: 383cf25e-f76c-4921-8d64-8bc41c908d57<br>Status: Started<br>Snapshot Count: 0<br>Number of Bricks: 3 x (2 + 1) = 9<br>Transport-type: tcp<br>Bricks:<br>Brick1: budgie-brick1.arriwebgate.com:/data/gluster<br>Brick2: budgie-brick2.arriwebgate.com:/data/gluster<br>Brick3: budgie-arbiter.arriwebgate.com:/data/gluster (arbiter)<br>Brick4: parrot-brick1.arriwebgate.com:/data/gluster<br>Brick5: parrot-brick2.arriwebgate.com:/data/gluster<br>Brick6: parrot-arbiter.arriwebgate.com:/data/gluster (arbiter)<br>Brick7: kiwi-brick1.arriwebgate.com:/data/gluster<br>Brick8: kiwi-brick2.arriwebgate.com:/data/gluster<br>Brick9: kiwi-arbiter.arriwebgate.com:/data/gluster (arbiter)<br>Options Reconfigured:<br>performance.write-behind-window-size: 1MB<br>storage.fips-mode-rchecksum: on<br>changelog.changelog: on<br>geo-replication.ignore-pid-check: on<br>geo-replication.indexing: on<br>cluster.rebal-throttle: lazy<br>transport.address-family: inet<br>nfs.disable: on<br>performance.client-io-threads: off<br>performance.cache-size: 4GB<br>performance.io-thread-count: 16<br>performance.readdir-ahead: on<br>client.event-threads: 8<br>server.event-threads: 8<br>config.transport: tcp<br>performance.read-ahead: off<br>features.cache-invalidation: on<br>features.cache-invalidation-timeout: 600<br>performance.stat-prefetch: on<br>performance.cache-invalidation: on<br>performance.md-cache-timeout: 600<br>network.inode-lru-limit: 1000000<br>performance.parallel-readdir: on<br>storage.owner-uid: 1000<br>storage.owner-gid: 1000<br>cluster.background-self-heal-count: 64<br>cluster.shd-max-threads: 4<br><br><br>gluster volume heal webgate info<br><br>Brick budgie-brick1.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick budgie-brick2.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick budgie-arbiter.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick parrot-brick1.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick parrot-brick2.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick parrot-arbiter.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick kiwi-brick1.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick kiwi-brick2.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br>Brick kiwi-arbiter.arriwebgate.com:/data/gluster<br>Status: Connected<br>Number of entries: 0<br><br><br>gluster volume status webgate<br><br>Status of volume: webgate<br>Gluster process                             TCP Port  RDMA Port  Online  Pid<br>------------------------------------------------------------------------------<br>Brick budgie-brick1.arriwebgate.com:/data/g<br>luster                                      49152     0          Y       854<br>Brick budgie-brick2.arriwebgate.com:/data/g<br>luster                                      49152     0          Y       888<br>Brick budgie-arbiter.arriwebgate.com:/data/<br>gluster                                     49152     0          Y       857<br>Brick parrot-brick1.arriwebgate.com:/data/g<br>luster                                      49152     0          Y       1889<br>Brick parrot-brick2.arriwebgate.com:/data/g<br>luster                                      49152     0          Y       2505<br>Brick parrot-arbiter.arriwebgate.com:/data/<br>gluster                                     49152     0          Y       1439<br>Brick kiwi-brick1.arriwebgate.com:/data/glu<br>ster                                        49152     0          Y       24941<br>Brick kiwi-brick2.arriwebgate.com:/data/glu<br>ster                                        49152     0          Y       31448<br>Brick kiwi-arbiter.arriwebgate.com:/data/gl<br>uster                                       49152     0          Y       5483<br>Self-heal Daemon on localhost               N/A       N/A        Y       24700<br>Self-heal Daemon on budgie-brick1.arriwebga<br><a href="http://te.com">te.com</a>                                      N/A       N/A        Y       960<br>Self-heal Daemon on budgie-brick2.arriwebga<br><a href="http://te.com">te.com</a>                                      N/A       N/A        Y       974<br>Self-heal Daemon on parrot-brick1.arriwebga<br><a href="http://te.com">te.com</a>                                      N/A       N/A        Y       1811<br>Self-heal Daemon on parrot-brick2.arriwebga<br><a href="http://te.com">te.com</a>                                      N/A       N/A        Y       2543<br>Self-heal Daemon on kiwi-brick2.arriwebgate<br>.com                                        N/A       N/A        Y       31207<br>Self-heal Daemon on kiwi-arbiter.arriwebgat<br><a href="http://e.com">e.com</a>                                       N/A       N/A        Y       5229<br>Self-heal Daemon on parrot-arbiter.arriwebg<br><a href="http://ate.com">ate.com</a>                                     N/A       N/A        Y       1466<br>Self-heal Daemon on budgie-arbiter.arriwebg<br><a href="http://ate.com">ate.com</a>                                     N/A       N/A        Y       984<br><br>Task Status of Volume webgate<br>------------------------------------------------------------------------------<br>Task                 : Remove brick<br>ID                   : 6f67e1d4-23f4-46ca-a97a-2adb152ef294<br>Removed bricks:<br>budgie-brick1.arriwebgate.com:/data/gluster<br>budgie-brick2.arriwebgate.com:/data/gluster<br>budgie-arbiter.arriwebgate.com:/data/gluster<br>Status               : completed<br></div>