[Gluster-users] How do I temporarily take a brick out of service and then put it back later?

Greg Scott GregScott at infrasupport.com
Tue Sep 16 19:58:00 UTC 2014


Trying this command to remove the brick on the failed node:

[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/firewall-scripts
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
[root at lme-fw2 ~]#

But running top in another window - I see 97.8%wa with 0.0%id.  This suggests this system is spending all its time idle, waiting for disk I/Os to complete.  Even after I removed the brick associated with the dead node.  And gluster volume info has been hung for the past several minutes.  After 5+ minutes, it finally tells me no volumes present.  So what happened to the volumes I set up?  

But check this out:

[root at lme-fw2 ~]# gluster volume info
No volumes present
[root at lme-fw2 ~]#

In another window, I cd /firewall-scripts and look at a file.  This is my gluster volume.   Then I do this again:

[root at lme-fw2 ~]#
[root at lme-fw2 ~]# gluster volume info

Volume Name: firewall-scripts
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.253.1:/gluster-fw1
Brick2: 192.168.253.2:/gluster-fw2
Options Reconfigured:
network.ping-timeout: 5
[root at lme-fw2 ~]#

And now my volume shows up.  With both bricks.  What's up with that?  I removed the old brick but now it's here.  I also set my ping-timeout to 5 seconds something like an hour ago. 

So trying to remove the brick again...  At least it generates some output this time:

[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/firewall-scripts
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Incorrect brick 192.168.253.1:/firewall-scripts for volume firewall-scripts
[root at lme-fw2 ~]#

Ah - my brick name is wrong.  Trying again with the correct brick name....  Uh-oh!

[root at lme-fw2 ~]#
[root at lme-fw2 ~]# gluster volume remove-brick firewall-scripts 192.168.253.1:/gluster-fw1
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Remove Brick successful
[root at lme-fw2 ~]#
[root at lme-fw2 ~]# gluster volume info

And we're hung.

When I tell the surviving node to take out the brick from the failed node, why does Gluster on the surviving node hang???

[root at lme-fw2 firewall-scripts]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root at lme-fw2 firewall-scripts]# pwd
/firewall-scripts
[root at lme-fw2 firewall-scripts]#
[root at lme-fw2 firewall-scripts]# ls

[root at lme-fw2 firewall-scripts]# ls
ls: cannot open directory .: Transport endpoint is not connected
[root at lme-fw2 firewall-scripts]# pwd
/firewall-scripts
[root at lme-fw2 firewall-scripts]#
[root at lme-fw2 firewall-scripts]# ls
ls: cannot access allow-all-with-nat: Transport endpoint is not connected
ls: cannot access rc.firewall: Transport endpoint is not connected
ls: cannot access rcfirewall.conf: Transport endpoint is not connected
ls: cannot access make-virgin.sh: Transport endpoint is not connected
ls: cannot access start-failover-monitor.sh: Transport endpoint is not connected
ls: cannot access failover-monitor.sh: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120201: Transport endpoint is not connected
ls: cannot access rc.firewall-20120201: Transport endpoint is not connected
ls: cannot access fwdate.txt: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120210: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120302: Transport endpoint is not connected
ls: cannot access rc.firewall-20120302: Transport endpoint is not connected
ls: cannot access failover-monitor.sh-20120406: Transport endpoint is not connected
ls: cannot access rc.firewall-20120704: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120704: Transport endpoint is not connected
ls: cannot access initial_rc.firewall-20120708: Transport endpoint is not connected
ls: cannot access =: Transport endpoint is not connected
ls: cannot access append.txt: Transport endpoint is not connected
ls: cannot access rc.firewall-20120708: Transport endpoint is not connected
ls: cannot access rcfirewall.conf-20120708: Transport endpoint is not connected
ls: reading directory .: Transport endpoint is not connected
=                    failover-monitor.sh-20120406  rc.firewall-20120302      rcfirewall.conf-20120302
allow-all            fwdate.txt                    rc.firewall-20120704      rcfirewall.conf-20120704
allow-all-with-nat   initial_rc.firewall-20120708  rc.firewall-20120708      rcfirewall.conf-20120708
append.txt           make-virgin.sh                rcfirewall.conf           start-failover-monitor.sh
etc                  rc.firewall                   rcfirewall.conf-20120201
failover-monitor.sh  rc.firewall-20120201          rcfirewall.conf-20120210
[root at lme-fw2 firewall-scripts]#



More information about the Gluster-users mailing list