[Gluster-users] remove-brick removed unexpected bricks
Ravishankar N
ravishankar at redhat.com
Tue Aug 13 13:49:18 UTC 2013
On 08/13/2013 06:21 PM, Cool wrote:
> I'm pretty sure I did "watch ... remove-brick ... status" till it
> mentioned everything is completed before trigger commit, I should make
> it clear in my previous mail.
>
> Actually you can read my mail again - in step #5, files on /sdc1 got
> migrated instead of /sdd1, even though my command was trying to
> remove-brick /sdd1,
Ah, my bad. Got it now. This is strange..
> this is the root cause (to me) that caused the problem, as data on
> /sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1
> from gfs_v0. It seems vol definition information got some problem in
> gluster.
If you are able to reproduce the issue, does 'gluster volume info' show
the correct bricks before and after start-status-commit operations of
removing sdd1? You could also see if there are any error messages in
/var/log/glusterfs/<volname>-rebalance.log
-Ravi
>
> -C.B.
>
> On 8/12/2013 9:51 PM, Ravishankar N wrote:
>> On 08/13/2013 03:43 AM, Cool wrote:
>>> remove-brick in 3.4.0 seems removing wrong bricks, can someone help
>>> to review the environment/steps to see if I did anything stupid?
>>>
>>> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages
>>> from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1:
>>> ii glusterfs-client 3.4.0final-ubuntu1~precise1 clustered
>>> file-system (client package)
>>> ii glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common
>>> libraries and translator modules
>>> ii glusterfs-server 3.4.0final-ubuntu1~precise1 clustered
>>> file-system (server package)
>>>
>>> step to reproduce the problem:
>>> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1
>>> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1
>>> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1
>>> 4. rebalance to make files distributed to all three pair of disks
>>> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on
>>> ***/sdc1*** are migrating out
>>> 6. remove-brick commit led to data loss in gfs_v0
>>>
>>> If between step 5 and 6 I initiate a remove-brick targeting /sdc1,
>>> then after commit I would not lose anything since all data will be
>>> migrated back to /sdb1.
>>>
>>
>> You should ensure that a 'remove-brick start ' has completed and
>> then commit it before initiating the second one. The correct way to
>> do this would be:
>> 5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start
>> 6. Check that the data migration has been completed using the status
>> command:
>> # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1
>> status
>> 7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit
>> 8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start
>> 9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status
>> 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit
>>
>> This would leave you with the original replica 2 volume that you had
>> begun with. Hope this helps.
>>
>> Note:
>> The latest version of glusterfs has the check that prevents a second
>> remove-brick operation until the first one has been committed.
>> (You would receive a message thus : "volume remove-brick start:
>> failed: An earlier remove-brick task exists for volume <volname>.
>> Either commit it or stop it before starting a new task." )
>>
>> -Ravi
>>
>>
>>> -C.B.
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>
More information about the Gluster-users
mailing list