[Gluster-users] Recovering files "lost" during a rebalance on a Dispersed 3+1

Jeremy Davis-Turak jeremy at rosalind.bio
Tue Sep 12 15:53:25 UTC 2023


Hello,
We are running glusterfs 6.6 on Ubuntu.

We have a Gluster storage system that is a few years old. There are 4 VMs
running a Dispersed (NOT replicated) system - a 3 + 1 configuration.

Generally performance is well tuned for our needs, but the problem arose
last time we added bricks: we attempted a rebalance which is reported as
failed.  From the mounted POSIX view of the file system, we see many files
that report to be of size 0 bytes, which they shouldn’t be.

We’ve attempted all kinds of heal and other operations to no avail. I
finally figured out how to find the gfid of the files , and I found where
it thought the shards were located. They were indeed 0 bytes … however, I
was able to find shards *with the same gfid *located on other bricks.

So, I think that when the rebalance failed, somehow the system kept
thinking that the files should exist in the NEW brick location instead of
the one that actually has content. For one file I did try to delete the
shards of size 0, but the system still thinks that the file is of size 0,
which means it didn’t point to the other shards with the same gfid. Is it
possible to manually move shards from brick to another? I'm clearly
tinkering with things that aren't meant to be tinkered with ... but I don't
fully understand how GlusterFS functions under the hood.

We’re at a loss as to how to fix this, and I haven’t had luck finding
anyone who can help. We have quite a few files that we would like to
recover, so it’s important that we figure out how to.

Thanks,

Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230912/7d1ff463/attachment.html>


More information about the Gluster-users mailing list