[Gluster-users] Kosher admin practices
peek at nimbios.org
Tue Jul 23 17:53:58 UTC 2013
I have a cluster with replication (four machines, two drives in each)
for testing that I've been beating on. I've just simulated one type of
hardware failure by remounting a drive read-only.
The manual covers many useful things: Adding/removing peers;
Starting/stopping, creating, expanding, shrinking, and deleting volumes;
etc. But it doesn't cover what you should do to replace a failed brick
to minimize frustration and chances of data loss.
I can't unmount the brick because glusterfs still has open files on it.
If I stop the glusterfs-server then that takes the other brick in the
machine out of commission too.
I have the same problem if I reboot the machine -- I take the other
brick out of service.
What's the correct way to deal with this? Is there a way to tell
gluster to take a brick out of commission for replacement without
interrupting access to other bricks in the same machine?
Thanks for your help,
More information about the Gluster-users