[Gluster-users] Recovering from remove-brick where shards did not rebalance
Xavi Hernandez
jahernan at redhat.com
Thu Sep 9 07:41:04 UTC 2021
Hi Anthony,
On Thu, Sep 9, 2021 at 8:27 AM Anthony Hoppe <anthony at vofr.net> wrote:
> Ok! I'm actually poking at this now, so great timing.
>
> The only mistake I made, I believe, was I expanded the last shard to
> 64MB. I forgot that bit. I'm going to try again leaving that one as
> is. Otherwise here is what my process has been so far. It may be a bit
> roundabout but here it is:
>
> 1) copy main file + shards from each node to directories on recovery
> storage
> 2) separate empty and non-empty files
> 3) compare non-empty files (diff -q the directories) for discrepancies
>
> If everything seems to check out:
>
> 4) combine empty files into one directory overwriting dupes
> 5) combine non-empty files into one directory overwriting dupes
> 6) expand all files not already 64 MB to 64 MB, except last shard.
> 7) create a numerically sorted list of files
> 8) spot check sort list and append shard 0 to top of list if necessary.
>
I guess this means identifying missing shards and creating them with 64
MiB. If so, that's fine.
Shard 0 is the main file and needs to be there always.
9) cat everything together reading from sorted list.
>
> Does this sound more or less like I'm going down the right path?
>
Yes, it should work.
Xavi
>
> Thanks!
>
> On 9/8/21 11:18 PM, Xavi Hernandez wrote:
> > Hi Anthony,
> >
> > On Wed, Sep 8, 2021 at 6:11 PM Anthony Hoppe <anthony at vofr.net
> > <mailto:anthony at vofr.net>> wrote:
> >
> > Hi Xavi,
> >
> > I am working with a distributred-replicated volume. What I've been
> > doing is copying the shards from each node to their own "recovery"
> > directory, discarding shards that are 0 bytes, then comparing the
> > remainder and combining unique shards into a common directory. Then
> > I'd build a sorted list so the shards are sorted numerically adding
> > the "main file" to the top of the list and then have cat run through
> > the list. I had one pair of shards that diff told me were not
> > equal, but their byte size was equivalent. In that case, I'm not
> > sure which is the "correct" shard, but I'd note that and just pick
> > one with the intention of circling back if cat'ing things together
> > didn't work out...which so far I haven't had any luck.
> >
> >
> > If there's a shard with different contents probably it has a pending
> > heal. If it's a replica 3, most probably 2 of the files should match. In
> > that case this should be the "good" version. Otherwise you will need to
> > check the stat and extended attributes of the files from each brick to
> > see which one is the best.
> >
> >
> > How can I identify if a shard is not full size? I haven't checked
> > every single shard, but they seem to be 64 MB in size. Would that
> > mean I need to make sure all but the last shard is 64 MB? I suspect
> > this might be my issue.
> >
> >
> > If you are using the default shard size, they should be 64 MiB (i.e.
> > 67108864 bytes). Any file smaller than that (including the main file,
> > but not the last shard) must be expanded to this size (truncate -s
> > 67108864 <file>). All shards must exist (from 1 to last number). If one
> > is missing you need to create it (touch <file> && truncate -s 67108864
> > <file>).
> >
> >
> > Also, is shard 0 what would appear as the actual file (so
> > largefile.raw or whatever)? It seems in my scenario these files are
> > ~48 MB. I assume that means I need to extend it to 64 MB?
> >
> >
> > Yes, shard 0 is the main file, and it also needs to be extended to 64
> MiB.
> >
> > Regards,
> >
> > Xavi
> >
> >
> > This is all great information. Thanks!
> >
> > ~ Anthony
> >
> >
> >
> ------------------------------------------------------------------------
> >
> > *From: *"Xavi Hernandez" <jahernan at redhat.com
> > <mailto:jahernan at redhat.com>>
> > *To: *"anthony" <anthony at vofr.net <mailto:anthony at vofr.net>>
> > *Cc: *"gluster-users" <gluster-users at gluster.org
> > <mailto:gluster-users at gluster.org>>
> > *Sent: *Wednesday, September 8, 2021 1:57:51 AM
> > *Subject: *Re: [Gluster-users] Recovering from remove-brick
> > where shards did not rebalance
> >
> > Hi Anthony,
> >
> > On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe <anthony at vofr.net
> > <mailto:anthony at vofr.net>> wrote:
> >
> > I am currently playing with concatenating main file + shards
> > together. Is it safe to assume that a shard with the same
> > ID and sequence number
> > (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is
> > identical across bricks? That is, I can copy all the shards
> > into a single location overwriting and/or discarding
> > duplicates, then concatenate them together in order? Or is
> > it a more complex?
> >
> >
> > Assuming it's a replicated volume, a given shard should appear
> > on all bricks of the same replicated subvolume. If there were no
> > pending heals, they should all have the same contents (however
> > you can easily check that by running an md5sum (or similar) on
> > each file).
> >
> > On distributed-replicated volumes it's possible to have the same
> > shard on two different subvolumes. In this case one of the
> > subvolumes contains the real file, and the other a special
> > 0-bytes file with mode '---------T'. You need to take the real
> > file and ignore the second one.
> >
> > Shards may be smaller than the shard size. In this case you
> > should extend the shard to the shard size before concatenating
> > it with the rest of the shards (for example using "truncate
> > -s"). The last shard may be smaller. It doesn't need to be
> extended.
> >
> > Once you have all the shards, you can concatenate them. Note
> > that the first shard of a file (or shard 0) is not inside the
> > .shard directory. You must take it from the location where the
> > file is normally seen.
> >
> > Regards,
> >
> > Xavi
> >
> >
> >
> ------------------------------------------------------------------------
> >
> > *From: *"anthony" <anthony at vofr.net
> > <mailto:anthony at vofr.net>>
> > *To: *"gluster-users" <gluster-users at gluster.org
> > <mailto:gluster-users at gluster.org>>
> > *Sent: *Tuesday, September 7, 2021 10:18:07 AM
> > *Subject: *Re: [Gluster-users] Recovering from
> > remove-brick where shards did not rebalance
> >
> > I've been playing with re-adding the bricks and here is
> > some interesting behavior.
> >
> > When I try to force add the bricks to the volume while
> > it's running, I get complaints about one of the bricks
> > already being a member of a volume. If I stop the
> > volume, I can then force-add the bricks. However, the
> > volume won't start without force. Once the volume is
> > force started, all of the bricks remain offline.
> >
> > I feel like I'm close...but not quite there...
> >
> >
> ------------------------------------------------------------------------
> >
> > *From: *"anthony" <anthony at vofr.net
> > <mailto:anthony at vofr.net>>
> > *To: *"Strahil Nikolov" <hunter86_bg at yahoo.com
> > <mailto:hunter86_bg at yahoo.com>>
> > *Cc: *"gluster-users" <gluster-users at gluster.org
> > <mailto:gluster-users at gluster.org>>
> > *Sent: *Tuesday, September 7, 2021 7:45:44 AM
> > *Subject: *Re: [Gluster-users] Recovering from
> > remove-brick where shards did not rebalance
> >
> > I was contemplating these options, actually, but not
> > finding anything in my research showing someone had
> > tried either before gave me pause.
> >
> > One thing I wasn't sure about when doing a force
> > add-brick was if gluster would wipe the existing
> > data from the added bricks. Sounds like that may
> > not be the case?
> >
> > With regards to concatenating the main file +
> > shards, how would I go about identifying the shards
> > that pair with the main file? I see the shards have
> > sequence numbers, but I'm not sure how to match the
> > identifier to the main file.
> >
> > Thanks!!
> >
> >
> ------------------------------------------------------------------------
> >
> > *From: *"Strahil Nikolov" <hunter86_bg at yahoo.com
> > <mailto:hunter86_bg at yahoo.com>>
> > *To: *"anthony" <anthony at vofr.net
> > <mailto:anthony at vofr.net>>, "gluster-users"
> > <gluster-users at gluster.org
> > <mailto:gluster-users at gluster.org>>
> > *Sent: *Tuesday, September 7, 2021 6:02:36 AM
> > *Subject: *Re: [Gluster-users] Recovering from
> > remove-brick where shards did
> not rebalance
> >
> > The data should be recoverable by concatenating
> > the main file with all shards. Then you can copy
> > the data back via the FUSE mount point.
> >
> > I think that some users reported that add-brick
> > with the force option allows to 'undo' the
> > situation and 're-add' the data, but I have
> > never tried that and I cannot guarantee that it
> > will even work.
> >
> > The simplest way is to recover from a recent
> > backup , but sometimes this leads to a data loss.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe
> > <anthony at vofr.net <mailto:anthony at vofr.net>>
> > wrote:
> > Hello,
> >
> > I did a bad thing and did a remove-brick on
> > a set of bricks in a distributed-replicate
> > volume where rebalancing did not
> > successfully rebalance all files. In
> > sleuthing around the various bricks on the 3
> > node pool, it appears that a number of the
> > files within the volume may have been stored
> > as shards. With that, I'm unsure how to
> > proceed with recovery.
> >
> > Is it possible to re-add the removed bricks
> > somehow and then do a heal? Or is there a
> > way to recover data from shards somehow?
> >
> > Thanks!
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST /
> > 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > <https://meet.google.com/cpu-eiue-hvk>
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > <mailto:Gluster-users at gluster.org>
> >
> https://lists.gluster.org/mailman/listinfo/gluster-users
> > <
> https://lists.gluster.org/mailman/listinfo/gluster-users>
> >
> >
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > <https://meet.google.com/cpu-eiue-hvk>
> > Gluster-users mailing list
> > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> > <https://lists.gluster.org/mailman/listinfo/gluster-users>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210909/c0be8d32/attachment.html>
More information about the Gluster-users
mailing list