[Gluster-users] Recovering from remove-brick where shards did not rebalance
Anthony Hoppe
anthony at vofr.net
Thu Sep 9 06:27:03 UTC 2021
Ok! I'm actually poking at this now, so great timing.
The only mistake I made, I believe, was I expanded the last shard to
64MB. I forgot that bit. I'm going to try again leaving that one as
is. Otherwise here is what my process has been so far. It may be a bit
roundabout but here it is:
1) copy main file + shards from each node to directories on recovery storage
2) separate empty and non-empty files
3) compare non-empty files (diff -q the directories) for discrepancies
If everything seems to check out:
4) combine empty files into one directory overwriting dupes
5) combine non-empty files into one directory overwriting dupes
6) expand all files not already 64 MB to 64 MB, except last shard.
7) create a numerically sorted list of files
8) spot check sort list and append shard 0 to top of list if necessary.
9) cat everything together reading from sorted list.
Does this sound more or less like I'm going down the right path?
Thanks!
On 9/8/21 11:18 PM, Xavi Hernandez wrote:
> Hi Anthony,
>
> On Wed, Sep 8, 2021 at 6:11 PM Anthony Hoppe <anthony at vofr.net
> <mailto:anthony at vofr.net>> wrote:
>
> Hi Xavi,
>
> I am working with a distributred-replicated volume. What I've been
> doing is copying the shards from each node to their own "recovery"
> directory, discarding shards that are 0 bytes, then comparing the
> remainder and combining unique shards into a common directory. Then
> I'd build a sorted list so the shards are sorted numerically adding
> the "main file" to the top of the list and then have cat run through
> the list. I had one pair of shards that diff told me were not
> equal, but their byte size was equivalent. In that case, I'm not
> sure which is the "correct" shard, but I'd note that and just pick
> one with the intention of circling back if cat'ing things together
> didn't work out...which so far I haven't had any luck.
>
>
> If there's a shard with different contents probably it has a pending
> heal. If it's a replica 3, most probably 2 of the files should match. In
> that case this should be the "good" version. Otherwise you will need to
> check the stat and extended attributes of the files from each brick to
> see which one is the best.
>
>
> How can I identify if a shard is not full size? I haven't checked
> every single shard, but they seem to be 64 MB in size. Would that
> mean I need to make sure all but the last shard is 64 MB? I suspect
> this might be my issue.
>
>
> If you are using the default shard size, they should be 64 MiB (i.e.
> 67108864 bytes). Any file smaller than that (including the main file,
> but not the last shard) must be expanded to this size (truncate -s
> 67108864 <file>). All shards must exist (from 1 to last number). If one
> is missing you need to create it (touch <file> && truncate -s 67108864
> <file>).
>
>
> Also, is shard 0 what would appear as the actual file (so
> largefile.raw or whatever)? It seems in my scenario these files are
> ~48 MB. I assume that means I need to extend it to 64 MB?
>
>
> Yes, shard 0 is the main file, and it also needs to be extended to 64 MiB.
>
> Regards,
>
> Xavi
>
>
> This is all great information. Thanks!
>
> ~ Anthony
>
>
> ------------------------------------------------------------------------
>
> *From: *"Xavi Hernandez" <jahernan at redhat.com
> <mailto:jahernan at redhat.com>>
> *To: *"anthony" <anthony at vofr.net <mailto:anthony at vofr.net>>
> *Cc: *"gluster-users" <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Sent: *Wednesday, September 8, 2021 1:57:51 AM
> *Subject: *Re: [Gluster-users] Recovering from remove-brick
> where shards did not rebalance
>
> Hi Anthony,
>
> On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe <anthony at vofr.net
> <mailto:anthony at vofr.net>> wrote:
>
> I am currently playing with concatenating main file + shards
> together. Is it safe to assume that a shard with the same
> ID and sequence number
> (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is
> identical across bricks? That is, I can copy all the shards
> into a single location overwriting and/or discarding
> duplicates, then concatenate them together in order? Or is
> it a more complex?
>
>
> Assuming it's a replicated volume, a given shard should appear
> on all bricks of the same replicated subvolume. If there were no
> pending heals, they should all have the same contents (however
> you can easily check that by running an md5sum (or similar) on
> each file).
>
> On distributed-replicated volumes it's possible to have the same
> shard on two different subvolumes. In this case one of the
> subvolumes contains the real file, and the other a special
> 0-bytes file with mode '---------T'. You need to take the real
> file and ignore the second one.
>
> Shards may be smaller than the shard size. In this case you
> should extend the shard to the shard size before concatenating
> it with the rest of the shards (for example using "truncate
> -s"). The last shard may be smaller. It doesn't need to be extended.
>
> Once you have all the shards, you can concatenate them. Note
> that the first shard of a file (or shard 0) is not inside the
> .shard directory. You must take it from the location where the
> file is normally seen.
>
> Regards,
>
> Xavi
>
>
> ------------------------------------------------------------------------
>
> *From: *"anthony" <anthony at vofr.net
> <mailto:anthony at vofr.net>>
> *To: *"gluster-users" <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Sent: *Tuesday, September 7, 2021 10:18:07 AM
> *Subject: *Re: [Gluster-users] Recovering from
> remove-brick where shards did not rebalance
>
> I've been playing with re-adding the bricks and here is
> some interesting behavior.
>
> When I try to force add the bricks to the volume while
> it's running, I get complaints about one of the bricks
> already being a member of a volume. If I stop the
> volume, I can then force-add the bricks. However, the
> volume won't start without force. Once the volume is
> force started, all of the bricks remain offline.
>
> I feel like I'm close...but not quite there...
>
> ------------------------------------------------------------------------
>
> *From: *"anthony" <anthony at vofr.net
> <mailto:anthony at vofr.net>>
> *To: *"Strahil Nikolov" <hunter86_bg at yahoo.com
> <mailto:hunter86_bg at yahoo.com>>
> *Cc: *"gluster-users" <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Sent: *Tuesday, September 7, 2021 7:45:44 AM
> *Subject: *Re: [Gluster-users] Recovering from
> remove-brick where shards did not rebalance
>
> I was contemplating these options, actually, but not
> finding anything in my research showing someone had
> tried either before gave me pause.
>
> One thing I wasn't sure about when doing a force
> add-brick was if gluster would wipe the existing
> data from the added bricks. Sounds like that may
> not be the case?
>
> With regards to concatenating the main file +
> shards, how would I go about identifying the shards
> that pair with the main file? I see the shards have
> sequence numbers, but I'm not sure how to match the
> identifier to the main file.
>
> Thanks!!
>
> ------------------------------------------------------------------------
>
> *From: *"Strahil Nikolov" <hunter86_bg at yahoo.com
> <mailto:hunter86_bg at yahoo.com>>
> *To: *"anthony" <anthony at vofr.net
> <mailto:anthony at vofr.net>>, "gluster-users"
> <gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>>
> *Sent: *Tuesday, September 7, 2021 6:02:36 AM
> *Subject: *Re: [Gluster-users] Recovering from
> remove-brick where shards did not rebalance
>
> The data should be recoverable by concatenating
> the main file with all shards. Then you can copy
> the data back via the FUSE mount point.
>
> I think that some users reported that add-brick
> with the force option allows to 'undo' the
> situation and 're-add' the data, but I have
> never tried that and I cannot guarantee that it
> will even work.
>
> The simplest way is to recover from a recent
> backup , but sometimes this leads to a data loss.
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe
> <anthony at vofr.net <mailto:anthony at vofr.net>>
> wrote:
> Hello,
>
> I did a bad thing and did a remove-brick on
> a set of bricks in a distributed-replicate
> volume where rebalancing did not
> successfully rebalance all files. In
> sleuthing around the various bricks on the 3
> node pool, it appears that a number of the
> files within the volume may have been stored
> as shards. With that, I'm unsure how to
> proceed with recovery.
>
> Is it possible to re-add the removed bricks
> somehow and then do a heal? Or is there a
> way to recover data from shards somehow?
>
> Thanks!
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST /
> 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> <https://meet.google.com/cpu-eiue-hvk>
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
More information about the Gluster-users
mailing list