[Gluster-users] Machine becomes its own peer

Scott Hazelhurst Scott.Hazelhurst at wits.ac.za
Fri Feb 17 05:49:56 UTC 2017

Dear all

Last week I posted a query about a problem I had with a machine that had failed but the underlying hard disk with the gluster brick was good. I’ve made some progress in restoring. I now have the problem with my new restored machine where it becomes its own peer, which then breaks everything.

1. Gluster daemons are off on all peers, content of /var/lib/glusterd/peers looks good.
2. I start the gluster daemons on all peers. All looks good.
3. For about 2 minutes, there’s no obvious problem — if I do a gluster peer status on any machine it looks good, if I do a gluster volume status A01 on any machine it looks good.
4. Then at some point, the /var/lib/glusterd/peers file of the new, restored machine gets an entry for itself and things start breaking. A typical error message is the understandable 

: Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, lock held by: 4fb930f7-554e-462a-9204-4592591feeb8 

5. This is repeatable — if I stop daemons, remove the offending entry in /var/lib/glusterd/peer, and restart, the same behavior occurs — all good for a minute or two and then something magically puts something in /var/lib/glusterd/peers

In a previous step in restoring my machine, I had a different error of mismatching cksums and what I did then may be the cause of the problem.  In searching the list archives I found someone with a similar cksum problem, and the proposed solution was to copy the /var/lib/glusterd/vols/ from another of the peers to the new machine. This may not be the issue but this is the only thing I think I did that was unconventional.

I am running version 3.7.5-19 on Scientific Linux 6.8

If anyone can suggest a way forward I would be grateful

Many thanks


