[Gluster-users] How do I temporarily take a brick out of service and then put it back later?
Greg Scott
GregScott at infrasupport.com
Tue Sep 16 19:58:00 UTC 2014
Trying this command to remove the brick on the failed node:
[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/firewall-scripts
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
[root at lme-fw2 ~]#
But running top in another window - I see 97.8%wa with 0.0%id. This suggests this system is spending all its time idle, waiting for disk I/Os to complete. Even after I removed the brick associated with the dead node. And gluster volume info has been hung for the past several minutes. After 5+ minutes, it finally tells me no volumes present. So what happened to the volumes I set up?
But check this out:
[root at lme-fw2 ~]# gluster volume info
No volumes present
[root at lme-fw2 ~]#
In another window, I cd /firewall-scripts and look at a file. This is my gluster volume. Then I do this again:
[root at lme-fw2 ~]#
[root at lme-fw2 ~]# gluster volume info
Volume Name: firewall-scripts
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.253.1:/gluster-fw1
Brick2: 192.168.253.2:/gluster-fw2
Options Reconfigured:
network.ping-timeout: 5
[root at lme-fw2 ~]#
And now my volume shows up. With both bricks. What's up with that? I removed the old brick but now it's here. I also set my ping-timeout to 5 seconds something like an hour ago.
So trying to remove the brick again... At least it generates some output this time:
[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/firewall-scripts
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Incorrect brick 192.168.253.1:/firewall-scripts for volume firewall-scripts
[root at lme-fw2 ~]#
Ah - my brick name is wrong. Trying again with the correct brick name.... Uh-oh!
[root at lme-fw2 ~]#
[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/gluster-fw1
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Remove Brick successful
[root at lme-fw2 ~]#
[root at lme-fw2 ~]# gluster volume info
And we're hung.
When I tell the surviving node to take out the brick from the failed node, why does Gluster on the surviving node hang???
[root at lme-fw2 firewall-scripts]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root at lme-fw2 firewall-scripts]# pwd
/firewall-scripts
[root at lme-fw2 firewall-scripts]#
[root at lme-fw2 firewall-scripts]# ls
[root at lme-fw2 firewall-scripts]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root at lme-fw2 firewall-scripts]# pwd
/firewall-scripts
[root at lme-fw2 firewall-scripts]#
[root at lme-fw2 firewall-scripts]# ls
ls: cannot access allow-all-with-nat: Transport endpoint is not connected
ls: cannot access rc.firewall: Transport endpoint is not connected
ls: cannot access rcfirewall.conf: Transport endpoint is not connected
ls: cannot access make-virgin.sh: Transport endpoint is not connected
ls: cannot access start-failover-monitor.sh: Transport endpoint is not connected
ls: cannot access failover-monitor.sh: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120201: Transport endpoint is not connected
ls: cannot access rc.firewall-20120201: Transport endpoint is not connected
ls: cannot access fwdate.txt: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120210: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120302: Transport endpoint is not connected
ls: cannot access rc.firewall-20120302: Transport endpoint is not connected
ls: cannot access failover-monitor.sh-20120406: Transport endpoint is not connected
ls: cannot access rc.firewall-20120704: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120704: Transport endpoint is not connected
ls: cannot access initial_rc.firewall-20120708: Transport endpoint is not connected
ls: cannot access =: Transport endpoint is not connected
ls: cannot access append.txt: Transport endpoint is not connected
ls: cannot access rc.firewall-20120708: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120708: Transport endpoint is not connected
ls: reading directory .: Transport endpoint is not connected
= failover-monitor.sh-20120406 rc.firewall-20120302 rcfirewall.conf-20120302
allow-all fwdate.txt rc.firewall-20120704 rcfirewall.conf-20120704
allow-all-with-nat initial_rc.firewall-20120708 rc.firewall-20120708 rcfirewall.conf-20120708
append.txt make-virgin.sh rcfirewall.conf start-failover-monitor.sh
etc rc.firewall rcfirewall.conf-20120201
failover-monitor.sh rc.firewall-20120201 rcfirewall.conf-20120210
[root at lme-fw2 firewall-scripts]#
More information about the Gluster-users
mailing list