[Gluster-users] Best Practices for different failure scenarios?

Wed Feb 19 20:15:37 UTC 2014

On Wed, Feb 19, 2014 at 3:07 PM, Michael Peek <peek at nimbios.org> wrote:
> Is there a best practices document somewhere for how to handle standard
> problems that crop up?

Short answer, it sounds like you'd benefit from playing with a test
cluster... Would I be correct in guessing that you haven't setup a
gluster pool yet?
You might want to look at:
https://ttboj.wordpress.com/2014/01/08/automatically-deploying-glusterfs-with-puppet-gluster-vagrant/
This way you can try them out easily...
For some of those points... solve them with...

>  Sort of a crib notes for things like:
>
> 1) What do you do if you see that a drive is about to fail?
RAID6

> 2) What do you do if a drive has already failed?
RAID6

> 3) What do you do if a peer is about to fail?
Get a new peer ready...

> 4) What do you do if a peer has failed?
Replace with new peer...

> 5) What do you do to reinstall a peer from scratch (i.e. what
> configuration files/directories do you need to restore to get the host
> back up and talking to the rest of the cluster)?
Bring up a new peer. Add to cluster... Same as failed peer...

> 6) What do you do with failed-heals?
> 7) What do you do with split-brains?
These are more complex issues and a number of people have written about them...
Eg: http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/

Cheers,
James

>
> Michael
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users