[Gluster-users] Big problem

Mon Jan 21 16:27:57 UTC 2013

Hi,

I had a 4 x 3 gluster volume distributed over 6 servers (2 bricks from each). I wanted to move to 4 x 2 volume removing two nodes. The initial config is here:

http://fpaste.org/txxs/

I asked for the command how to do it and got it from the gluster IRC. I then proceeded to run it:

gluster volume remove-brick home0 replica 2 192.168.1.243:/d35 192.168.1.240:/d35 192.168.1.243:/d36 192.168.1.240:/d36

having had read the gluster help output I assertained I should probably add start to the end to have it gracefully check everything (it did warn me without of possible data loss). However the result was that it started rebalancing and immediately had reconfigured the volume to 6 x 2 replica sets so now I have a HUGE mess:

http://fpaste.org/EpKG/

Most processes failed and directory listings come in double:

[root at wn-c-27 test]# ls
ls: cannot access hadoop-fuse-addon.tgz: No such file or directory
ls: cannot access hadoop-fuse-addon.tgz: No such file or directory
etc  hadoop-fuse-addon.tgz  hadoop-fuse-addon.tgz
[root at wn-c-27 test]# 

I need urgently help how to recover from this state? It seems gluster now has me in a huge mess and it will be tough to get out of it. Immediately when I noticed this I stopped the brick-remove with stop command, but the mess is as it is. Should I force the remove brick? Should I stop the volume and stop gluster and manually reconfigure it to 4x3 or how can I recover to a consistent filesystem. This is users /home so a huge mess is NOT a good thing. Due to 3x replication there is no backup right now either...

Mario Kadastik, PhD
Researcher

---
  "Physics is like sex, sure it may have practical reasons, but that's not why we do it" 
     -- Richard P. Feynman