[Gluster-users] Can't replace dead peer/brick
Bryan Murphy
bmurphy1976 at gmail.com
Fri Sep 16 20:06:34 UTC 2011
I have a simple setup:
gluster> volume info
Volume Name: myvolume
Type: Distributed-Replicate
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.2.218.188:/srv
Brick2: 10.116.245.136:/srv
Brick3: 10.206.38.103:/srv
Brick4: 10.114.41.53:/srv
Brick5: 10.68.73.41:/srv
Brick6: 10.204.129.91:/srv
I *killed* Brick #4 (kill -9 and then shut down instance).
My intention is to simulate a catastrophic failure of Brick4 and replace it
with a new server.
I probed the new server, then ran the following command:
gluster> peer probe 10.76.242.97
Probe successful
gluster> volume replace-brick myvolume 10.114.41.53:/srv 10.76.242.97:/srv
start
replace-brick started successfully
I waited a little while, saw no traffic on the new server and then ran this:
gluster> volume replace-brick myvolume 10.114.41.53:/srv 10.76.242.97:/srv
status
It never returned. Now my cluster is in some weird state. It's still
serving files, I still have a job copying files to it, but I am unable to
replace the bad peer with a new one.
root at ip-10-2-218-188:~# gluster volume replace-brick myvolume 10.114.41.53:/srv
10.76.242.97:/srv status
replace-brick status unknown
root at ip-10-2-218-188:~# gluster volume replace-brick myvolume 10.114.41.53:/srv
10.76.242.97:/srv abort
replace-brick abort failed
root at ip-10-2-218-188:~# gluster volume replace-brick myvolume 10.114.41.53:/srv
10.76.242.97:/srv start
replace-brick failed to start
How can I get my cluster back into a clean working state?
Thanks,
Bryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110916/7c681d80/attachment.html>
More information about the Gluster-users
mailing list