[Gluster-users] Recovering lost node in dispersed volume

Thu Sep 22 16:01:45 UTC 2016

Thanks for that advice. It worked. Setting the UUID in glusterd.info was
the bit I missed.

It seemed to work without the setfattr step in my particular case.

On Thu, Sep 22, 2016 at 11:05 AM, Serkan Çoban <cobanserkan at gmail.com>
wrote:

> Here are the steps for replacing a failed node:
>
>
> 1- In one of the other servers run "grep thaila
> /var/lib/glusterd/peers/* | cut -d: -f1 | cut -d/ -f6" and note the
> UUID
> 2- stop glusterd on failed server and add "UUID=uuid_from_previous
> step" to /var/lib/glusterd/glusterd.info and start glusterd
> 3- run "gluster peer probe calliope"
> 4- restart glusterd
> 5- now gluster peer status should show all the peers. if not probe
> them manually as above.
> 6-for all the bricks run the command "setfattr -n
> trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/vol_name/info | cut -d= -f2 | sed 's/-//g')
> brick_name"
> 7 restart glusterd and everythimg should be fine.
>
> I think I read the steps from this link:
> https://support.rackspace.com/how-to/recover-from-a-failed-
> server-in-a-glusterfs-array/
> Look to the "keep the ip address" part.
>
>
> On Thu, Sep 22, 2016 at 5:16 PM, Tony Schreiner
> <anthony.schreiner at bc.edu> wrote:
> > I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1
> is
> > not optimal).
> > Originally created in version 3.7 but recently upgraded without issue to
> > 3.8.
> >
> > # gluster vol info
> > Volume Name: rvol
> > Type: Disperse
> > Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x (3 + 1) = 4
> > Transport-type: tcp
> > Bricks:
> > Brick1: calliope:/brick/p1
> > Brick2: euterpe:/brick/p1
> > Brick3: lemans:/brick/p1
> > Brick4: thalia:/brick/p1
> > Options Reconfigured:
> > performance.readdir-ahead: on
> > nfs.disable: off
> >
> > I inadvertently allowed one of the nodes (thalia) to be reinstalled;
> which
> > overwrote the system, but not the brick, and I need guidance in getting
> it
> > back into the volume.
> >
> > (on lemans)
> > gluster peer status
> > Number of Peers: 3
> >
> > Hostname: calliope
> > Uuid: 72373eb1-8047-405a-a094-891e559755da
> > State: Peer in Cluster (Connected)
> >
> > Hostname: euterpe
> > Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb
> > State: Peer in Cluster (Connected)
> >
> > Hostname: thalia
> > Uuid: 843169fa-3937-42de-8fda-9819efc75fe8
> > State: Peer Rejected (Connected)
> >
> > the thalia peer is rejected. If I try to peer probe thalia I am told it
> > already part of the pool. If from thalia, I try to peer probe one of the
> > others, I am told that they are already part of another pool.
> >
> > I have tried removing the thalia brick with
> > gluster vol remove-brick rvol thalia:/brick/p1 start
> > but get the error
> > volume remove-brick start: failed: Remove brick incorrect brick count of
> 1
> > for disperse 4
> >
> > I am not finding much guidance for this particular situation. I could
> use a
> > suggestion on how to recover. It's a lab situation so no biggie if I lose
> > it.
> > Cheers
> >
> > Tony Schreiner
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160922/75d968d7/attachment.html>