[Gluster-users] Recovering from remove-brick where shards did not rebalance

Thu Sep 9 06:27:03 UTC 2021

Ok!  I'm actually poking at this now, so great timing.

The only mistake I made, I believe, was I expanded the last shard to 
64MB.  I forgot that bit.  I'm going to try again leaving that one as 
is.  Otherwise here is what my process has been so far.  It may be a bit 
roundabout but here it is:

1) copy main file + shards from each node to directories on recovery storage
2) separate empty and non-empty files
3) compare non-empty files (diff -q the directories) for discrepancies

If everything seems to check out:

4) combine empty files into one directory overwriting dupes
5) combine non-empty files into one directory overwriting dupes
6) expand all files not already 64 MB to 64 MB, except last shard.
7) create a numerically sorted list of files
8) spot check sort list and append shard 0 to top of list if necessary.
9) cat everything together reading from sorted list.

Does this sound more or less like I'm going down the right path?

Thanks!

On 9/8/21 11:18 PM, Xavi Hernandez wrote:
> Hi Anthony,
> 
> On Wed, Sep 8, 2021 at 6:11 PM Anthony Hoppe <anthony at vofr.net 
> <mailto:anthony at vofr.net>> wrote:
> 
>     Hi Xavi,
> 
>     I am working with a distributred-replicated volume.  What I've been
>     doing is copying the shards from each node to their own "recovery"
>     directory, discarding shards that are 0 bytes, then comparing the
>     remainder and combining unique shards into a common directory.  Then
>     I'd build a sorted list so the shards are sorted numerically adding
>     the "main file" to the top of the list and then have cat run through
>     the list.  I had one pair of shards that diff told me were not
>     equal, but their byte size was equivalent.  In that case, I'm not
>     sure which is the "correct" shard, but I'd note that and just pick
>     one with the intention of circling back if cat'ing things together
>     didn't work out...which so far I haven't had any luck.
> 
> 
> If there's a shard with different contents probably it has a pending 
> heal. If it's a replica 3, most probably 2 of the files should match. In 
> that case this should be the "good" version. Otherwise you will need to 
> check the stat and extended attributes of the files from each brick to 
> see which one is the best.
> 
> 
>     How can I identify if a shard is not full size?  I haven't checked
>     every single shard, but they seem to be 64 MB in size.  Would that
>     mean I need to make sure all but the last shard is 64 MB?  I suspect
>     this might be my issue.
> 
> 
> If you are using the default shard size, they should be 64 MiB (i.e. 
> 67108864 bytes). Any file smaller than that (including the main file, 
> but not the last shard) must be expanded to this size (truncate -s 
> 67108864 <file>). All shards must exist (from 1 to last number). If one 
> is missing you need to create it (touch <file> && truncate -s 67108864 
> <file>).
> 
> 
>     Also, is shard 0 what would appear as the actual file (so
>     largefile.raw or whatever)?  It seems in my scenario these files are
>     ~48 MB.  I assume that means I need to extend it to 64 MB?
> 
> 
> Yes, shard 0 is the main file, and it also needs to be extended to 64 MiB.
> 
> Regards,
> 
> Xavi
> 
> 
>     This is all great information.  Thanks!
> 
>     ~ Anthony
> 
> 
>     ------------------------------------------------------------------------
> 
>         *From: *"Xavi Hernandez" <jahernan at redhat.com
>         <mailto:jahernan at redhat.com>>
>         *To: *"anthony" <anthony at vofr.net <mailto:anthony at vofr.net>>
>         *Cc: *"gluster-users" <gluster-users at gluster.org
>         <mailto:gluster-users at gluster.org>>
>         *Sent: *Wednesday, September 8, 2021 1:57:51 AM
>         *Subject: *Re: [Gluster-users] Recovering from remove-brick
>         where shards did not rebalance
> 
>         Hi Anthony,
> 
>         On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe <anthony at vofr.net
>         <mailto:anthony at vofr.net>> wrote:
> 
>             I am currently playing with concatenating main file + shards
>             together.  Is it safe to assume that a shard with the same
>             ID and sequence number
>             (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is
>             identical across bricks?  That is, I can copy all the shards
>             into a single location overwriting and/or discarding
>             duplicates, then concatenate them together in order?  Or is
>             it a more complex?
> 
> 
>         Assuming it's a replicated volume, a given shard should appear
>         on all bricks of the same replicated subvolume. If there were no
>         pending heals, they should all have the same contents (however
>         you can easily check that by running an md5sum (or similar) on
>         each file).
> 
>         On distributed-replicated volumes it's possible to have the same
>         shard on two different subvolumes. In this case one of the
>         subvolumes contains the real file, and the other a special
>         0-bytes file with mode '---------T'. You need to take the real
>         file and ignore the second one.
> 
>         Shards may be smaller than the shard size. In this case you
>         should extend the shard to the shard size before concatenating
>         it with the rest of the shards (for example using "truncate
>         -s"). The last shard may be smaller. It doesn't need to be extended.
> 
>         Once you have all the shards, you can concatenate them. Note
>         that the first shard of a file (or shard 0) is not inside the
>         .shard directory. You must take it from the location where the
>         file is normally seen.
> 
>         Regards,
> 
>         Xavi
> 
> 
>             ------------------------------------------------------------------------
> 
>                 *From: *"anthony" <anthony at vofr.net
>                 <mailto:anthony at vofr.net>>
>                 *To: *"gluster-users" <gluster-users at gluster.org
>                 <mailto:gluster-users at gluster.org>>
>                 *Sent: *Tuesday, September 7, 2021 10:18:07 AM
>                 *Subject: *Re: [Gluster-users] Recovering from
>                 remove-brick where shards did not        rebalance
> 
>                 I've been playing with re-adding the bricks and here is
>                 some interesting behavior.
> 
>                 When I try to force add the bricks to the volume while
>                 it's running, I get complaints about one of the bricks
>                 already being a member of a volume.  If I stop the
>                 volume, I can then force-add the bricks.  However, the
>                 volume won't start without force.  Once the volume is
>                 force started, all of the bricks remain offline.
> 
>                 I feel like I'm close...but not quite there...
> 
>                 ------------------------------------------------------------------------
> 
>                     *From: *"anthony" <anthony at vofr.net
>                     <mailto:anthony at vofr.net>>
>                     *To: *"Strahil Nikolov" <hunter86_bg at yahoo.com
>                     <mailto:hunter86_bg at yahoo.com>>
>                     *Cc: *"gluster-users" <gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>>
>                     *Sent: *Tuesday, September 7, 2021 7:45:44 AM
>                     *Subject: *Re: [Gluster-users] Recovering from
>                     remove-brick where shards did not        rebalance
> 
>                     I was contemplating these options, actually, but not
>                     finding anything in my research showing someone had
>                     tried either before gave me pause.
> 
>                     One thing I wasn't sure about when doing a force
>                     add-brick was if gluster would wipe the existing
>                     data from the added bricks.  Sounds like that may
>                     not be the case?
> 
>                     With regards to concatenating the main file +
>                     shards, how would I go about identifying the shards
>                     that pair with the main file?  I see the shards have
>                     sequence numbers, but I'm not sure how to match the
>                     identifier to the main file.
> 
>                     Thanks!!
> 
>                     ------------------------------------------------------------------------
> 
>                         *From: *"Strahil Nikolov" <hunter86_bg at yahoo.com
>                         <mailto:hunter86_bg at yahoo.com>>
>                         *To: *"anthony" <anthony at vofr.net
>                         <mailto:anthony at vofr.net>>, "gluster-users"
>                         <gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>>
>                         *Sent: *Tuesday, September 7, 2021 6:02:36 AM
>                         *Subject: *Re: [Gluster-users] Recovering from
>                         remove-brick where shards did not        rebalance
> 
>                         The data should be recoverable by concatenating
>                         the main file with all shards. Then you can copy
>                         the data back via the FUSE mount point.
> 
>                         I think that some users reported that add-brick
>                         with the force option allows to 'undo' the
>                         situation and 're-add' the data, but I have
>                         never tried that and I cannot guarantee that it
>                         will even work.
> 
>                         The simplest way is to recover from a recent
>                         backup , but sometimes this leads to a data loss.
> 
>                         Best Regards,
>                         Strahil Nikolov
> 
>                             On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe
>                             <anthony at vofr.net <mailto:anthony at vofr.net>>
>                             wrote:
>                             Hello,
> 
>                             I did a bad thing and did a remove-brick on
>                             a set of bricks in a distributed-replicate
>                             volume where rebalancing did not
>                             successfully rebalance all files.  In
>                             sleuthing around the various bricks on the 3
>                             node pool, it appears that a number of the
>                             files within the volume may have been stored
>                             as shards.  With that, I'm unsure how to
>                             proceed with recovery.
> 
>                             Is it possible to re-add the removed bricks
>                             somehow and then do a heal?  Or is there a
>                             way to recover data from shards somehow?
> 
>                             Thanks!
>                             ________
> 
> 
> 
>                             Community Meeting Calendar:
> 
>                             Schedule -
>                             Every 2nd and 4th Tuesday at 14:30 IST /
>                             09:00 UTC
>                             Bridge: https://meet.google.com/cpu-eiue-hvk
>                             <https://meet.google.com/cpu-eiue-hvk>
>                             Gluster-users mailing list
>                             Gluster-users at gluster.org
>                             <mailto:Gluster-users at gluster.org>
>                             https://lists.gluster.org/mailman/listinfo/gluster-users
>                             <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> 
>             ________
> 
> 
> 
>             Community Meeting Calendar:
> 
>             Schedule -
>             Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>             Bridge: https://meet.google.com/cpu-eiue-hvk
>             <https://meet.google.com/cpu-eiue-hvk>
>             Gluster-users mailing list
>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>             https://lists.gluster.org/mailman/listinfo/gluster-users
>             <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
>