[Gluster-users] remove-brick removed unexpected bricks

Tue Aug 13 12:51:11 UTC 2013

I'm pretty sure I did "watch ... remove-brick ... status" till it 
mentioned everything is completed before trigger commit, I should make 
it clear in my previous mail.

Actually you can read my mail again - in step #5, files on /sdc1 got 
migrated instead of /sdd1, even though my command was trying to 
remove-brick /sdd1, this is the root cause (to me) that caused the 
problem, as data on /sdc1 migrated to /sdb1 and /sdd1, then commit 
simply remove /sdd1 from gfs_v0. It seems vol definition information got 
some problem in gluster.

-C.B.

On 8/12/2013 9:51 PM, Ravishankar N wrote:
> On 08/13/2013 03:43 AM, Cool wrote:
>> remove-brick in 3.4.0 seems removing wrong bricks, can someone help 
>> to review the environment/steps to see if I did anything stupid?
>>
>> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages 
>> from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1:
>> ii  glusterfs-client 3.4.0final-ubuntu1~precise1 clustered 
>> file-system (client package)
>> ii  glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common 
>> libraries and translator modules
>> ii  glusterfs-server 3.4.0final-ubuntu1~precise1 clustered 
>> file-system (server package)
>>
>> step to reproduce the problem:
>> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1
>> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1
>> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1
>> 4. rebalance to make files distributed to all three pair of disks
>> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on 
>> ***/sdc1*** are migrating out
>> 6. remove-brick commit led to data loss in gfs_v0
>>
>> If between step 5 and 6 I initiate a remove-brick targeting /sdc1, 
>> then after commit I would not lose anything since all data will be 
>> migrated back to /sdb1.
>>
>
> You should ensure  that a 'remove-brick  start ' has completed and 
> then commit it before initiating the second one. The correct way to do 
> this would be:
> 5.   # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start
> 6. Check that the data migration has been completed using the status 
> command:
>       # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 status
> 7.   #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit
> 8.   # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start
> 9.   # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status
> 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit
>
> This would leave you with the original replica 2 volume that you had 
> begun with. Hope this helps.
>
> Note:
> The latest version of glusterfs has the check that prevents a second 
> remove-brick operation until the first one has been committed.
> (You would receive a message thus : "volume remove-brick start: 
> failed: An earlier remove-brick task exists for volume <volname>. 
> Either commit it or stop it before starting a new task." )
>
> -Ravi
>
>
>> -C.B.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>