[Gluster-users] Rebuilding a failed cluster

Wed Nov 29 21:58:51 UTC 2023

Much depends on the original volume layout. For replica volumes you'll 
find multiple copies of the same file on different bricks. And sometimes 
0-byte files that are placeholders of renamed files: do not overwrite a 
good file with its empty version!
If the old volume is still online, it's better if you copy from its FUSE 
mount point to the new one.
But since it's a temporary "backup", there's no need to use another 
Gluster volume as the destination: just use a USB drive directly 
connected to the old nodes (one at a time) or to a machine that can 
still FUSE mount the old volume. Once you have a backup, write-protect 
it and experiment freely :)

Diego

Il 29/11/2023 19:17, Richard Betel ha scritto:
> Ok, it's been a while, but I'm getting back to this "project".
> I was unable to get gluster for the platform: the machines are 
> ARM-based, and there are no ARM binaries on the gluster package repo. I 
> tried building it instead, but the version of gluster I was running was 
> quite old, and I couldn't get all the right package versions to do a 
> successful build.
> As a result, it sounds like my best option is to follow your alternate 
> suggestion:
> "The other option is to setup a new cluster and volume and then mount 
> the volume via FUSE and copy the data from one of the bricks."
> 
> I want to be sure I understand what you're saying, though. Here's my plan:
> create 3 VMs on amd64 processors(*)
> Give each a 100G brick
> set up the 3 bricks as disperse
> mount the new gluster volume on my workstation
> copy directories from one of the old bricks to the mounted new GFS volume
> Copy fully restored data from new GFS volume to workstation or whatever 
> permanent setup I go with.
> 
> Is that right? Or do I want the GFS system to be offline while I copy 
> the contents of the old brick to the new brick?
> 
> (*) I'm not planning to keep my GFS on VMs on cloud, I just want 
> something temporary to work with so I don't blow up anything else.
> 
> 
> 
> 
> On Sat, 12 Aug 2023 at 09:20, Strahil Nikolov <hunter86_bg at yahoo.com 
> <mailto:hunter86_bg at yahoo.com>> wrote:
> 
>     If you preserved the gluster structure in /etc/ and /var/lib, you
>     should be able to run the cluster again.
>     First install the same gluster version all nodes and then overwrite
>     the structure in /etc and in /var/lib.
>     Once you mount the bricks , start glusterd and check the situation.
> 
>     The other option is to setup a new cluster and volume and then mount
>     the volume via FUSE and copy the data from one of the bricks.
> 
>     Best Regards,
>     Strahil Nikolov
> 
>     On Saturday, August 12, 2023, 7:46 AM, Richard Betel
>     <emteeoh at gmail.com <mailto:emteeoh at gmail.com>> wrote:
> 
>         I had a small cluster with a disperse 3 volume. 2 nodes had
>         hardware failures and no longer boot, and I don't have
>         replacement hardware for them (it's an old board called a
>         PC-duino). However, I do have their intact root filesystems and
>         the disks the bricks are on.
> 
>         So I need to rebuild the cluster on all new host hardware. does
>         anyone have any suggestions on how to go about doing this? I've
>         built 3 vms to be a new test cluster, but if I copy over a file
>         from the 3 nodes and try to read it, I can't and get errors in
>         /var/log/glusterfs/foo.log:
>         [2023-08-12 03:50:47.638134 +0000] W [MSGID: 114031]
>         [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-gv-client-0:
>         remote operation failed. [{path=/helmetpart.scad},
>         {gfid=00000000-0000-0000-0000-000000000000}
>         , {errno=61}, {error=No data available}]
>         [2023-08-12 03:50:49.834859 +0000] E [MSGID: 122066]
>         [ec-common.c:1301:ec_prepare_update_cbk] 0-gv-disperse-0: Unable
>         to get config xattr. FOP : 'FXATTROP' failed on gfid
>         076a511d-3721-4231-ba3b-5c4cbdbd7f5d. Pa
>         rent FOP: READ [No data available]
>         [2023-08-12 03:50:49.834930 +0000] W
>         [fuse-bridge.c:2994:fuse_readv_cbk] 0-glusterfs-fuse: 39: READ
>         => -1 gfid=076a511d-3721-4231-ba3b-5c4cbdbd7f5d
>         fd=0x7fbc9c001a98 (No data available)
> 
>         so obviously, I need to copy over more stuff from the original
>         cluster. If I force the 3 nodes and the volume to have the same
>         uuids, will that be enough?
>         ________
> 
> 
> 
>         Community Meeting Calendar:
> 
>         Schedule -
>         Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>         Bridge: https://meet.google.com/cpu-eiue-hvk
>         <https://meet.google.com/cpu-eiue-hvk>
>         Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>         https://lists.gluster.org/mailman/listinfo/gluster-users
>         <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786