[Gluster-users] Need to replace a brick on a failed first Gluster node

Sun Jan 22 23:58:55 UTC 2012

Looking on my good fw2 node in /etc/glusterd, I see:

[root at lme-fw2 glusterd]# pwd
/etc/glusterd
[root at lme-fw2 glusterd]# ls
geo-replication  glusterd.info  nfs  peers  vols
[root at lme-fw2 glusterd]# more glusterd.info
UUID=e7055b02-7885-4be9-945b-032b2cd9d5e0
[root at lme-fw2 glusterd]#
[root at lme-fw2 glusterd]#
[root at lme-fw2 glusterd]# ls peers
ce4dff10-54b6-4ef4-8760-8b31d1bf61e8
[root at lme-fw2 glusterd]# more peers/ce4dff10-54b6-4ef4-8760-8b31d1bf61e8
uuid=e13ce11c-62a9-4137-ba94-e8ebea7030d5
state=3
hostname1=192.168.253.1
[root at lme-fw2 glusterd]#

Looking on my rebuilt fw1 node, I see:

[root at lme-fw1 openswan-2.6.37]# cd /etc/glusterd
[root at lme-fw1 glusterd]# ls
geo-replication  glusterd.info  nfs  peers  vols
[root at lme-fw1 glusterd]# more glusterd.info
UUID=e13ce11c-62a9-4137-ba94-e8ebea7030d5
[root at lme-fw1 glusterd]# ls peers
[root at lme-fw1 glusterd]#

So the new fw1's UUID is the same as what fw2 thinks it should be - wonder how that happened? If I do a gluster peer from fw1, then everything will sync back up again?  

- Greg

-----Original Message-----
From: Giovanni Toraldo [mailto:gt at libersoft.it] 
Sent: Sunday, January 22, 2012 5:35 AM
To: Greg Scott
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Need to replace a brick on a failed first Gluster node

Hi Greg,

2012/1/22 Greg Scott <GregScott at infrasupport.com>:
> My challenge is, the hard drive at 192.168.253.1 failed.  This was the first
> Gluster node when I set everything up.   I replaced its hard drive and am
> rebuilding it.  I have a good copy of everything I care about in the
> 192.168.253.2 brick.   My thought was, I could just remove the old
> 192.168.253.1 brick and replica, then gluster peer and add it all back
> again.

It's far more simple: if you retain the same hostname/ip address on
the new machine, you need to make sure the new glusterd has the same
UUID of the old dead one (there is a file in /etc/glusterd),
configurations are automatically synced back at the first contact with
the other active nodes.

Instead, if you replace the node with a different node with different
hostname / ip:
http://community.gluster.org/q/a-replica-node-has-failed-completely-and-must-be-replaced-with-new-empty-hardware-how-do-i-add-the-new-hardware-and-bricks-back-into-the-replica-pair-and-begin-the-healing-process/

-- 
Giovanni Toraldo - LiberSoft
http://www.libersoft.it