[Gluster-users] "gluster peer status" messed up
Brian Candler
B.Candler at pobox.com
Mon Dec 3 14:52:13 UTC 2012
On Mon, Dec 03, 2012 at 01:44:47PM +0000, Brian Candler wrote:
> So this all looks broken, and as I can't find any gluster documentation
> saying what these various states mean, I'm not sure how to proceed. Any
> suggestions?
Update. On storage1 and storage3 I killed all glusterfs(d) processes, did
rm /var/lib/glusterd/peers/*
rm -rf /var/lib/glusterd/vols/*
and restarted glusterd. Then I did "gluster peer probe storage2".
On the first attempt, I was getting
State: Accepted peer request (Connected)
Couldn't work out why it didn't move to full connected peer. But after
detach and probe again, from storage3 I got
State: Peer in Cluster (Connected)
which suggests it is OK.
However "gluster volume info" on both shows that I have lost the volume I
had on storage3.
Trying to recreate it:
# gluster volume create scratch3 storage3:/disk/scratch/scratch3
/disk/scratch/scratch3 or a prefix of it is already part of a volume
Now I do remember seeing something about a script to remove xattrs, but I
can't find it in the ubuntu glusterfs-{server,common,client,examples}
packages.
Back to mailing list archives:
http://www.mail-archive.com/gluster-users@gluster.org/msg09013.html
So I did the two setfattr commands and was able to recreate the volume
without loss of data.
storage1 was a bit more awkward:
root at storage1:/var/lib/glusterd# gluster peer status
No peers present
root at storage1:/var/lib/glusterd# gluster peer probe storage2
storage2 is already part of another cluster
<<Digs around source code>>
<<./xlators/mgmt/glusterd/src/glusterd-handler.c>>
OK, because storage2 already has a peer, it looks like I have to probe
storage1 from storage2, not the other way round. It works this time.
So I think it's all working again now, but for someone who was not prepared
to experiment and get dirty, it would have been a very hairy experience.
I have to say that in my opinion, the two worst aspects of glusterfs by far
are:
- lack of error reporting, other than grubbing through log files on both
client and server
- lack of documentation (especially recovery procedures for things like
failed bricks, replacing bricks, volume info out of sync, split-brain
data out of sync)
Unfortunately, live systems are not where you want to be experimenting :-(
Regards,
Brian.
More information about the Gluster-users
mailing list