[Gluster-users] Strange behaviour with add-brick followed by remove-brick

Brian Cipriano bcipriano at zerovfx.com
Wed Oct 30 15:40:52 UTC 2013


I had the exact same experience recently with a 3.4 distributed cluster I set up. I spent some time on the IRC but couldn’t track it down. Seems remove-brick is broken in 3.3 and 3.4. I guess folks don’t remove bricks very often :)

- brian




On Oct 30, 2013, at 11:21 AM, Lalatendu Mohanty <lmohanty at redhat.com> wrote:

> On 10/30/2013 08:40 PM, Lalatendu Mohanty wrote:
>> On 10/30/2013 03:43 PM, B.K.Raghuram wrote:
>>> I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I
>>> did the following sequence of steps and ended up with losing data so
>>> what did I do wrong?!
>>> 
>>> - Create a distributed volume with bricks on n9 and n10
>>> - Started the volume
>>> - NFS mounted the volume and created 100 files on it. Found that n9
>>> had 45, n10 had 55
>>> - Added a brick n11 to this volume
>>> - Removed a brick n10 from the volume with gluster remove brick <vol>
>>> <n10 brick name> start
>>> - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the
>>> same as on n9)
>>> - Checked status, it shows that no rebalanced files but that n10 had
>>> scanned 100 files and completed. 0 scanned for all the others
>>> - I then did a rebalance start force on the vol and found that n9 had
>>> 0 files, n10 had 55 files and n11 had 45 files - weird - looked like
>>> n9 had been removed but double checked again and found that n10 had
>>> indeed been removed.
>>> - did a remove-brick commit. Now same file distribution after that.
>>> volume info now shows the volume to have n9 and n11 and bricks.
>>> - did a rebalance start again on the volume. The rebalance-status now
>>> shows n11 had 45 rebalanced files, all the brick nodes had 45 files
>>> scanned and all show complete. The file layout after this is n9 has 45
>>> files and n10 has 55 files. n11 has 0 files!
>>> - An ls on the nfs mount now shows only 45 files so the other 55 not
>>> visible because they are on n10 which is not part of the volume!
>>> 
>>> What have I done wrong in this sequence?
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> 
>> I think running rebalnce (force) in between "remove brick start" and "remove brick commit" is the issue. Can you please paste your command as per the time line of events. That would make it more clear. 
>> 
>> Below are the steps, I do to replace a brick and it works for me. 
>> 
>> gluster volume add-brick VOLNAME NEW-BRICK
>> gluster volume remove-brick VOLNAME BRICK start
>> gluster volume remove-brick VOLNAME BRICK status
>> gluster volume remove-brick VOLNAME BRICK commit
> I will also suggest you to use distribute-replicate volumes, so that you have a replica copy always and it reduces the probability of losing data.
> 
> -Lala 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131030/173a59e8/attachment.html>


More information about the Gluster-users mailing list