<div dir="ltr"><div dir="ltr">Hi Anthony,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Sep 9, 2021 at 8:27 AM Anthony Hoppe <<a href="mailto:anthony@vofr.net">anthony@vofr.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Ok! I'm actually poking at this now, so great timing.<br>
<br>
The only mistake I made, I believe, was I expanded the last shard to <br>
64MB. I forgot that bit. I'm going to try again leaving that one as <br>
is. Otherwise here is what my process has been so far. It may be a bit <br>
roundabout but here it is:<br>
<br>
1) copy main file + shards from each node to directories on recovery storage<br>
2) separate empty and non-empty files<br>
3) compare non-empty files (diff -q the directories) for discrepancies<br>
<br>
If everything seems to check out:<br>
<br>
4) combine empty files into one directory overwriting dupes<br>
5) combine non-empty files into one directory overwriting dupes<br>
6) expand all files not already 64 MB to 64 MB, except last shard.<br>
7) create a numerically sorted list of files<br>
8) spot check sort list and append shard 0 to top of list if necessary.<br></blockquote><div><br></div><div>I guess this means identifying missing shards and creating them with 64 MiB. If so, that's fine.</div><div><br></div><div>Shard 0 is the main file and needs to be there always.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
9) cat everything together reading from sorted list.<br>
<br>
Does this sound more or less like I'm going down the right path?<br></blockquote><div><br></div><div>Yes, it should work.</div><div><br></div><div>Xavi</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Thanks!<br>
<br>
On 9/8/21 11:18 PM, Xavi Hernandez wrote:<br>
> Hi Anthony,<br>
> <br>
> On Wed, Sep 8, 2021 at 6:11 PM Anthony Hoppe <<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a> <br>
> <mailto:<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a>>> wrote:<br>
> <br>
> Hi Xavi,<br>
> <br>
> I am working with a distributred-replicated volume. What I've been<br>
> doing is copying the shards from each node to their own "recovery"<br>
> directory, discarding shards that are 0 bytes, then comparing the<br>
> remainder and combining unique shards into a common directory. Then<br>
> I'd build a sorted list so the shards are sorted numerically adding<br>
> the "main file" to the top of the list and then have cat run through<br>
> the list. I had one pair of shards that diff told me were not<br>
> equal, but their byte size was equivalent. In that case, I'm not<br>
> sure which is the "correct" shard, but I'd note that and just pick<br>
> one with the intention of circling back if cat'ing things together<br>
> didn't work out...which so far I haven't had any luck.<br>
> <br>
> <br>
> If there's a shard with different contents probably it has a pending <br>
> heal. If it's a replica 3, most probably 2 of the files should match. In <br>
> that case this should be the "good" version. Otherwise you will need to <br>
> check the stat and extended attributes of the files from each brick to <br>
> see which one is the best.<br>
> <br>
> <br>
> How can I identify if a shard is not full size? I haven't checked<br>
> every single shard, but they seem to be 64 MB in size. Would that<br>
> mean I need to make sure all but the last shard is 64 MB? I suspect<br>
> this might be my issue.<br>
> <br>
> <br>
> If you are using the default shard size, they should be 64 MiB (i.e. <br>
> 67108864 bytes). Any file smaller than that (including the main file, <br>
> but not the last shard) must be expanded to this size (truncate -s <br>
> 67108864 <file>). All shards must exist (from 1 to last number). If one <br>
> is missing you need to create it (touch <file> && truncate -s 67108864 <br>
> <file>).<br>
> <br>
> <br>
> Also, is shard 0 what would appear as the actual file (so<br>
> largefile.raw or whatever)? It seems in my scenario these files are<br>
> ~48 MB. I assume that means I need to extend it to 64 MB?<br>
> <br>
> <br>
> Yes, shard 0 is the main file, and it also needs to be extended to 64 MiB.<br>
> <br>
> Regards,<br>
> <br>
> Xavi<br>
> <br>
> <br>
> This is all great information. Thanks!<br>
> <br>
> ~ Anthony<br>
> <br>
> <br>
> ------------------------------------------------------------------------<br>
> <br>
> *From: *"Xavi Hernandez" <<a href="mailto:jahernan@redhat.com" target="_blank">jahernan@redhat.com</a><br>
> <mailto:<a href="mailto:jahernan@redhat.com" target="_blank">jahernan@redhat.com</a>>><br>
> *To: *"anthony" <<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a> <mailto:<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a>>><br>
> *Cc: *"gluster-users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
> <mailto:<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>>><br>
> *Sent: *Wednesday, September 8, 2021 1:57:51 AM<br>
> *Subject: *Re: [Gluster-users] Recovering from remove-brick<br>
> where shards did not rebalance<br>
> <br>
> Hi Anthony,<br>
> <br>
> On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe <<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a><br>
> <mailto:<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a>>> wrote:<br>
> <br>
> I am currently playing with concatenating main file + shards<br>
> together. Is it safe to assume that a shard with the same<br>
> ID and sequence number<br>
> (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is<br>
> identical across bricks? That is, I can copy all the shards<br>
> into a single location overwriting and/or discarding<br>
> duplicates, then concatenate them together in order? Or is<br>
> it a more complex?<br>
> <br>
> <br>
> Assuming it's a replicated volume, a given shard should appear<br>
> on all bricks of the same replicated subvolume. If there were no<br>
> pending heals, they should all have the same contents (however<br>
> you can easily check that by running an md5sum (or similar) on<br>
> each file).<br>
> <br>
> On distributed-replicated volumes it's possible to have the same<br>
> shard on two different subvolumes. In this case one of the<br>
> subvolumes contains the real file, and the other a special<br>
> 0-bytes file with mode '---------T'. You need to take the real<br>
> file and ignore the second one.<br>
> <br>
> Shards may be smaller than the shard size. In this case you<br>
> should extend the shard to the shard size before concatenating<br>
> it with the rest of the shards (for example using "truncate<br>
> -s"). The last shard may be smaller. It doesn't need to be extended.<br>
> <br>
> Once you have all the shards, you can concatenate them. Note<br>
> that the first shard of a file (or shard 0) is not inside the<br>
> .shard directory. You must take it from the location where the<br>
> file is normally seen.<br>
> <br>
> Regards,<br>
> <br>
> Xavi<br>
> <br>
> <br>
> ------------------------------------------------------------------------<br>
> <br>
> *From: *"anthony" <<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a><br>
> <mailto:<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a>>><br>
> *To: *"gluster-users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
> <mailto:<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>>><br>
> *Sent: *Tuesday, September 7, 2021 10:18:07 AM<br>
> *Subject: *Re: [Gluster-users] Recovering from<br>
> remove-brick where shards did not rebalance<br>
> <br>
> I've been playing with re-adding the bricks and here is<br>
> some interesting behavior.<br>
> <br>
> When I try to force add the bricks to the volume while<br>
> it's running, I get complaints about one of the bricks<br>
> already being a member of a volume. If I stop the<br>
> volume, I can then force-add the bricks. However, the<br>
> volume won't start without force. Once the volume is<br>
> force started, all of the bricks remain offline.<br>
> <br>
> I feel like I'm close...but not quite there...<br>
> <br>
> ------------------------------------------------------------------------<br>
> <br>
> *From: *"anthony" <<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a><br>
> <mailto:<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a>>><br>
> *To: *"Strahil Nikolov" <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a><br>
> <mailto:<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>>><br>
> *Cc: *"gluster-users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
> <mailto:<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>>><br>
> *Sent: *Tuesday, September 7, 2021 7:45:44 AM<br>
> *Subject: *Re: [Gluster-users] Recovering from<br>
> remove-brick where shards did not rebalance<br>
> <br>
> I was contemplating these options, actually, but not<br>
> finding anything in my research showing someone had<br>
> tried either before gave me pause.<br>
> <br>
> One thing I wasn't sure about when doing a force<br>
> add-brick was if gluster would wipe the existing<br>
> data from the added bricks. Sounds like that may<br>
> not be the case?<br>
> <br>
> With regards to concatenating the main file +<br>
> shards, how would I go about identifying the shards<br>
> that pair with the main file? I see the shards have<br>
> sequence numbers, but I'm not sure how to match the<br>
> identifier to the main file.<br>
> <br>
> Thanks!!<br>
> <br>
> ------------------------------------------------------------------------<br>
> <br>
> *From: *"Strahil Nikolov" <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a><br>
> <mailto:<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>>><br>
> *To: *"anthony" <<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a><br>
> <mailto:<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a>>>, "gluster-users"<br>
> <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
> <mailto:<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>>><br>
> *Sent: *Tuesday, September 7, 2021 6:02:36 AM<br>
> *Subject: *Re: [Gluster-users] Recovering from<br>
> remove-brick where shards did not rebalance<br>
> <br>
> The data should be recoverable by concatenating<br>
> the main file with all shards. Then you can copy<br>
> the data back via the FUSE mount point.<br>
> <br>
> I think that some users reported that add-brick<br>
> with the force option allows to 'undo' the<br>
> situation and 're-add' the data, but I have<br>
> never tried that and I cannot guarantee that it<br>
> will even work.<br>
> <br>
> The simplest way is to recover from a recent<br>
> backup , but sometimes this leads to a data loss.<br>
> <br>
> Best Regards,<br>
> Strahil Nikolov<br>
> <br>
> On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe<br>
> <<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a> <mailto:<a href="mailto:anthony@vofr.net" target="_blank">anthony@vofr.net</a>>><br>
> wrote:<br>
> Hello,<br>
> <br>
> I did a bad thing and did a remove-brick on<br>
> a set of bricks in a distributed-replicate<br>
> volume where rebalancing did not<br>
> successfully rebalance all files. In<br>
> sleuthing around the various bricks on the 3<br>
> node pool, it appears that a number of the<br>
> files within the volume may have been stored<br>
> as shards. With that, I'm unsure how to<br>
> proceed with recovery.<br>
> <br>
> Is it possible to re-add the removed bricks<br>
> somehow and then do a heal? Or is there a<br>
> way to recover data from shards somehow?<br>
> <br>
> Thanks!<br>
> ________<br>
> <br>
> <br>
> <br>
> Community Meeting Calendar:<br>
> <br>
> Schedule -<br>
> Every 2nd and 4th Tuesday at 14:30 IST /<br>
> 09:00 UTC<br>
> Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" rel="noreferrer" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br>
> <<a href="https://meet.google.com/cpu-eiue-hvk" rel="noreferrer" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>><br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> <mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>><br>
> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
> <<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>><br>
> <br>
> <br>
> ________<br>
> <br>
> <br>
> <br>
> Community Meeting Calendar:<br>
> <br>
> Schedule -<br>
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
> Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" rel="noreferrer" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br>
> <<a href="https://meet.google.com/cpu-eiue-hvk" rel="noreferrer" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>><br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a> <mailto:<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>><br>
> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
> <<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>><br>
> <br>
> <br>
<br>
</blockquote></div></div>