[Gluster-users] Undeletable volume

Mon Apr 23 15:43:38 UTC 2012

[glusterfs 3.2.5, Ubuntu 11.10 x86_64]

Due to a problem with a node which was taken away for a while, but in the
mean time other volumes were created, I now have a situation where the
volume info has become unsynchronised.

There are three nodes. I have pasted 'gluster volume info' below: nodes
storage1 and storage2 know the volume called 'scratch', but node storage3
doesn't.

Now when I try to delete the volume, I can't:

root at storage2:~# gluster volume delete scratch
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
Volume scratch has been started.Volume needs to be stopped before deletion.
root at storage2:~# gluster volume stop scratch
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Volume scratch does not exist

Hmm. So try to synchronise the info:

root at storage2:~# gluster volume sync scratch
please delete all the volumes before full sync

But I can't delete this volume, that's the problem!

I could try 3.2.6, except strangely there is no build for Ubuntu 11.10,
only 11.04 / 10.10 / 10.04, see
http://download.gluster.com/pub/gluster/glusterfs/3.2/3.2.6/Ubuntu/

So for now I've hacked about with the files manually while glusterd was not running:

    cd /etc/glusterd
    rm -rf vols/scratch
    vi nfs/nfs-server.vol

Then I restarted glusterd. However things still aren't right:

root at storage3:/etc/glusterd# gluster volume stop data
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Stopping volume data has been unsuccessful
root at storage3:/etc/glusterd# gluster volume stop scratch3
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Stopping volume scratch3 has been successful

So it looks like I'm going to have to just zap the glusterd config and start
again, which is OK since this system isn't yet in production - but it
doesn't really inspire confidence in how to get out a problem in a
production environment :-(

This may be some sort of split-brain in the volume metadata, but there's no
split-brain in the volume contents, since these are just distributed
volumes, no replication is taking place.

Anyway, I just thought I'd forward it on in case it's of any use in
improving gluster.

Regards,

Brian.

===================================================================================
root at storage1:~# gluster volume info

Volume Name: scratch
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage2:/disk/scratch
Brick2: storage3:/disk/scratch

Volume Name: scratch3
Type: Distribute
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: storage3:/disk/scratch/scratch3

Volume Name: data
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage1:/disk/data1/data
Brick2: storage1:/disk/data2/data
root at storage1:~# 

root at storage2:~# gluster volume info

Volume Name: scratch
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage2:/disk/scratch
Brick2: storage3:/disk/scratch

Volume Name: scratch3
Type: Distribute
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: storage3:/disk/scratch/scratch3

Volume Name: data
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage1:/disk/data1/data
Brick2: storage1:/disk/data2/data
root at storage2:~# 

root at storage3:~# gluster volume info

Volume Name: scratch3
Type: Distribute
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: storage3:/disk/scratch/scratch3

Volume Name: data
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage1:/disk/data1/data
Brick2: storage1:/disk/data2/data
root at storage3:~#