[Gluster-users] Strange behaviour with add-brick followed by remove-brick

Wed Oct 30 10:13:46 UTC 2013

I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I
did the following sequence of steps and ended up with losing data so
what did I do wrong?!

- Create a distributed volume with bricks on n9 and n10
- Started the volume
- NFS mounted the volume and created 100 files on it. Found that n9
had 45, n10 had 55
- Added a brick n11 to this volume
- Removed a brick n10 from the volume with gluster remove brick <vol>
<n10 brick name> start
- n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the
same as on n9)
- Checked status, it shows that no rebalanced files but that n10 had
scanned 100 files and completed. 0 scanned for all the others
- I then did a rebalance start force on the vol and found that n9 had
0 files, n10 had 55 files and n11 had 45 files - weird - looked like
n9 had been removed but double checked again and found that n10 had
indeed been removed.
- did a remove-brick commit. Now same file distribution after that.
volume info now shows the volume to have n9 and n11 and bricks.
- did a rebalance start again on the volume. The rebalance-status now
shows n11 had 45 rebalanced files, all the brick nodes had 45 files
scanned and all show complete. The file layout after this is n9 has 45
files and n10 has 55 files. n11 has 0 files!
- An ls on the nfs mount now shows only 45 files so the other 55 not
visible because they are on n10 which is not part of the volume!

What have I done wrong in this sequence?