[Gluster-users] Recovering from remove-brick where shards did not rebalance

Thu Sep 9 07:41:04 UTC 2021

Hi Anthony,

On Thu, Sep 9, 2021 at 8:27 AM Anthony Hoppe <anthony at vofr.net> wrote:

> Ok!  I'm actually poking at this now, so great timing.
>
> The only mistake I made, I believe, was I expanded the last shard to
> 64MB.  I forgot that bit.  I'm going to try again leaving that one as
> is.  Otherwise here is what my process has been so far.  It may be a bit
> roundabout but here it is:
>
> 1) copy main file + shards from each node to directories on recovery
> storage
> 2) separate empty and non-empty files
> 3) compare non-empty files (diff -q the directories) for discrepancies
>
> If everything seems to check out:
>
> 4) combine empty files into one directory overwriting dupes
> 5) combine non-empty files into one directory overwriting dupes
> 6) expand all files not already 64 MB to 64 MB, except last shard.
> 7) create a numerically sorted list of files
> 8) spot check sort list and append shard 0 to top of list if necessary.
>

I guess this means identifying missing shards and creating them with 64
MiB. If so, that's fine.

Shard 0 is the main file and needs to be there always.

9) cat everything together reading from sorted list.
>
> Does this sound more or less like I'm going down the right path?
>

Yes, it should work.

Xavi

>
> Thanks!
>
> On 9/8/21 11:18 PM, Xavi Hernandez wrote:
> > Hi Anthony,
> >
> > On Wed, Sep 8, 2021 at 6:11 PM Anthony Hoppe <anthony at vofr.net
> > <mailto:anthony at vofr.net>> wrote:
> >
> >     Hi Xavi,
> >
> >     I am working with a distributred-replicated volume.  What I've been
> >     doing is copying the shards from each node to their own "recovery"
> >     directory, discarding shards that are 0 bytes, then comparing the
> >     remainder and combining unique shards into a common directory.  Then
> >     I'd build a sorted list so the shards are sorted numerically adding
> >     the "main file" to the top of the list and then have cat run through
> >     the list.  I had one pair of shards that diff told me were not
> >     equal, but their byte size was equivalent.  In that case, I'm not
> >     sure which is the "correct" shard, but I'd note that and just pick
> >     one with the intention of circling back if cat'ing things together
> >     didn't work out...which so far I haven't had any luck.
> >
> >
> > If there's a shard with different contents probably it has a pending
> > heal. If it's a replica 3, most probably 2 of the files should match. In
> > that case this should be the "good" version. Otherwise you will need to
> > check the stat and extended attributes of the files from each brick to
> > see which one is the best.
> >
> >
> >     How can I identify if a shard is not full size?  I haven't checked
> >     every single shard, but they seem to be 64 MB in size.  Would that
> >     mean I need to make sure all but the last shard is 64 MB?  I suspect
> >     this might be my issue.
> >
> >
> > If you are using the default shard size, they should be 64 MiB (i.e.
> > 67108864 bytes). Any file smaller than that (including the main file,
> > but not the last shard) must be expanded to this size (truncate -s
> > 67108864 <file>). All shards must exist (from 1 to last number). If one
> > is missing you need to create it (touch <file> && truncate -s 67108864
> > <file>).
> >
> >
> >     Also, is shard 0 what would appear as the actual file (so
> >     largefile.raw or whatever)?  It seems in my scenario these files are
> >     ~48 MB.  I assume that means I need to extend it to 64 MB?
> >
> >
> > Yes, shard 0 is the main file, and it also needs to be extended to 64
> MiB.
> >
> > Regards,
> >
> > Xavi
> >
> >
> >     This is all great information.  Thanks!
> >
> >     ~ Anthony
> >
> >
> >
>  ------------------------------------------------------------------------
> >
> >         *From: *"Xavi Hernandez" <jahernan at redhat.com
> >         <mailto:jahernan at redhat.com>>
> >         *To: *"anthony" <anthony at vofr.net <mailto:anthony at vofr.net>>
> >         *Cc: *"gluster-users" <gluster-users at gluster.org
> >         <mailto:gluster-users at gluster.org>>
> >         *Sent: *Wednesday, September 8, 2021 1:57:51 AM
> >         *Subject: *Re: [Gluster-users] Recovering from remove-brick
> >         where shards did not rebalance
> >
> >         Hi Anthony,
> >
> >         On Tue, Sep 7, 2021 at 8:20 PM Anthony Hoppe <anthony at vofr.net
> >         <mailto:anthony at vofr.net>> wrote:
> >
> >             I am currently playing with concatenating main file + shards
> >             together.  Is it safe to assume that a shard with the same
> >             ID and sequence number
> >             (5da7d7b9-7ff3-48d2-8dcd-4939364bda1f.242 for example) is
> >             identical across bricks?  That is, I can copy all the shards
> >             into a single location overwriting and/or discarding
> >             duplicates, then concatenate them together in order?  Or is
> >             it a more complex?
> >
> >
> >         Assuming it's a replicated volume, a given shard should appear
> >         on all bricks of the same replicated subvolume. If there were no
> >         pending heals, they should all have the same contents (however
> >         you can easily check that by running an md5sum (or similar) on
> >         each file).
> >
> >         On distributed-replicated volumes it's possible to have the same
> >         shard on two different subvolumes. In this case one of the
> >         subvolumes contains the real file, and the other a special
> >         0-bytes file with mode '---------T'. You need to take the real
> >         file and ignore the second one.
> >
> >         Shards may be smaller than the shard size. In this case you
> >         should extend the shard to the shard size before concatenating
> >         it with the rest of the shards (for example using "truncate
> >         -s"). The last shard may be smaller. It doesn't need to be
> extended.
> >
> >         Once you have all the shards, you can concatenate them. Note
> >         that the first shard of a file (or shard 0) is not inside the
> >         .shard directory. You must take it from the location where the
> >         file is normally seen.
> >
> >         Regards,
> >
> >         Xavi
> >
> >
> >
>  ------------------------------------------------------------------------
> >
> >                 *From: *"anthony" <anthony at vofr.net
> >                 <mailto:anthony at vofr.net>>
> >                 *To: *"gluster-users" <gluster-users at gluster.org
> >                 <mailto:gluster-users at gluster.org>>
> >                 *Sent: *Tuesday, September 7, 2021 10:18:07 AM
> >                 *Subject: *Re: [Gluster-users] Recovering from
> >                 remove-brick where shards did not        rebalance
> >
> >                 I've been playing with re-adding the bricks and here is
> >                 some interesting behavior.
> >
> >                 When I try to force add the bricks to the volume while
> >                 it's running, I get complaints about one of the bricks
> >                 already being a member of a volume.  If I stop the
> >                 volume, I can then force-add the bricks.  However, the
> >                 volume won't start without force.  Once the volume is
> >                 force started, all of the bricks remain offline.
> >
> >                 I feel like I'm close...but not quite there...
> >
> >
>  ------------------------------------------------------------------------
> >
> >                     *From: *"anthony" <anthony at vofr.net
> >                     <mailto:anthony at vofr.net>>
> >                     *To: *"Strahil Nikolov" <hunter86_bg at yahoo.com
> >                     <mailto:hunter86_bg at yahoo.com>>
> >                     *Cc: *"gluster-users" <gluster-users at gluster.org
> >                     <mailto:gluster-users at gluster.org>>
> >                     *Sent: *Tuesday, September 7, 2021 7:45:44 AM
> >                     *Subject: *Re: [Gluster-users] Recovering from
> >                     remove-brick where shards did not        rebalance
> >
> >                     I was contemplating these options, actually, but not
> >                     finding anything in my research showing someone had
> >                     tried either before gave me pause.
> >
> >                     One thing I wasn't sure about when doing a force
> >                     add-brick was if gluster would wipe the existing
> >                     data from the added bricks.  Sounds like that may
> >                     not be the case?
> >
> >                     With regards to concatenating the main file +
> >                     shards, how would I go about identifying the shards
> >                     that pair with the main file?  I see the shards have
> >                     sequence numbers, but I'm not sure how to match the
> >                     identifier to the main file.
> >
> >                     Thanks!!
> >
> >
>  ------------------------------------------------------------------------
> >
> >                         *From: *"Strahil Nikolov" <hunter86_bg at yahoo.com
> >                         <mailto:hunter86_bg at yahoo.com>>
> >                         *To: *"anthony" <anthony at vofr.net
> >                         <mailto:anthony at vofr.net>>, "gluster-users"
> >                         <gluster-users at gluster.org
> >                         <mailto:gluster-users at gluster.org>>
> >                         *Sent: *Tuesday, September 7, 2021 6:02:36 AM
> >                         *Subject: *Re: [Gluster-users] Recovering from
> >                         remove-brick where shards did
> not        rebalance
> >
> >                         The data should be recoverable by concatenating
> >                         the main file with all shards. Then you can copy
> >                         the data back via the FUSE mount point.
> >
> >                         I think that some users reported that add-brick
> >                         with the force option allows to 'undo' the
> >                         situation and 're-add' the data, but I have
> >                         never tried that and I cannot guarantee that it
> >                         will even work.
> >
> >                         The simplest way is to recover from a recent
> >                         backup , but sometimes this leads to a data loss.
> >
> >                         Best Regards,
> >                         Strahil Nikolov
> >
> >                             On Tue, Sep 7, 2021 at 9:29, Anthony Hoppe
> >                             <anthony at vofr.net <mailto:anthony at vofr.net>>
> >                             wrote:
> >                             Hello,
> >
> >                             I did a bad thing and did a remove-brick on
> >                             a set of bricks in a distributed-replicate
> >                             volume where rebalancing did not
> >                             successfully rebalance all files.  In
> >                             sleuthing around the various bricks on the 3
> >                             node pool, it appears that a number of the
> >                             files within the volume may have been stored
> >                             as shards.  With that, I'm unsure how to
> >                             proceed with recovery.
> >
> >                             Is it possible to re-add the removed bricks
> >                             somehow and then do a heal?  Or is there a
> >                             way to recover data from shards somehow?
> >
> >                             Thanks!
> >                             ________
> >
> >
> >
> >                             Community Meeting Calendar:
> >
> >                             Schedule -
> >                             Every 2nd and 4th Tuesday at 14:30 IST /
> >                             09:00 UTC
> >                             Bridge: https://meet.google.com/cpu-eiue-hvk
> >                             <https://meet.google.com/cpu-eiue-hvk>
> >                             Gluster-users mailing list
> >                             Gluster-users at gluster.org
> >                             <mailto:Gluster-users at gluster.org>
> >
> https://lists.gluster.org/mailman/listinfo/gluster-users
> >                             <
> https://lists.gluster.org/mailman/listinfo/gluster-users>
> >
> >
> >             ________
> >
> >
> >
> >             Community Meeting Calendar:
> >
> >             Schedule -
> >             Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >             Bridge: https://meet.google.com/cpu-eiue-hvk
> >             <https://meet.google.com/cpu-eiue-hvk>
> >             Gluster-users mailing list
> >             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >             https://lists.gluster.org/mailman/listinfo/gluster-users
> >             <https://lists.gluster.org/mailman/listinfo/gluster-users>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210909/c0be8d32/attachment.html>