[Gluster-users] remove-brick removed unexpected bricks

Thu Aug 15 05:43:31 UTC 2013

On 08/15/2013 11:11 AM, Cool wrote:
> I'm gonna stop debugging this as I still cannot figure out how to 
> reproduce this problem for further debug. I did 4~5 rounds of tests 
> (all from scratch) yesterday and today, only met the problem once 
> Monday afternoon, but repeating the steps didn't give me the same 
> result. Also checked log there was nothing wrong except rebalance was 
> happening on the wrong bricks.
>
> I will raise this again if I can have any useful information.
>
> -C.B.
>
Sure C.B, thanks for your efforts.
> On 8/13/2013 7:00 AM, Cool wrote:
>> Thanks Ravi, I manged to reproduce the issue for 2 times in the past 
>> several days, but without anything significant in log, volume info 
>> and after shows correct information (i.e. sdd1 got removed though 
>> data was not migrated out), rebalance.log telling it was migrating 
>> data out of sdc1, not sdd1.
>>
>> I'm doing another try now with -L TRACE to see if I can get more log 
>> information, this will take some time, will post here if I find 
>> anything helpful.
>>
>> -C.B.
>> On 8/13/2013 6:49 AM, Ravishankar N wrote:
>>> On 08/13/2013 06:21 PM, Cool wrote:
>>>> I'm pretty sure I did "watch ... remove-brick ... status" till it 
>>>> mentioned everything is completed before trigger commit, I should 
>>>> make it clear in my previous mail.
>>>>
>>>> Actually you can read my mail again - in step #5, files on /sdc1 
>>>> got migrated instead of /sdd1, even though my command was trying to 
>>>> remove-brick /sdd1, 
>>> Ah, my bad. Got it now. This is strange..
>>>> this is the root cause (to me) that caused the problem, as data on 
>>>> /sdc1 migrated to /sdb1 and /sdd1, then commit simply remove /sdd1 
>>>> from gfs_v0. It seems vol definition information got some problem 
>>>> in gluster.
>>> If you are able to reproduce the issue, does 'gluster volume info' 
>>> show the correct bricks before and after start-status-commit 
>>> operations of removing sdd1? You could also see if there are any 
>>> error messages in /var/log/glusterfs/<volname>-rebalance.log
>>>
>>> -Ravi
>>>>
>>>> -C.B.
>>>>
>>>> On 8/12/2013 9:51 PM, Ravishankar N wrote:
>>>>> On 08/13/2013 03:43 AM, Cool wrote:
>>>>>> remove-brick in 3.4.0 seems removing wrong bricks, can someone 
>>>>>> help to review the environment/steps to see if I did anything 
>>>>>> stupid?
>>>>>>
>>>>>> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following 
>>>>>> packages from ppa, both nodes have 3 xfs partitions sdb1, sdc1, 
>>>>>> sdd1:
>>>>>> ii  glusterfs-client 3.4.0final-ubuntu1~precise1 clustered 
>>>>>> file-system (client package)
>>>>>> ii  glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common 
>>>>>> libraries and translator modules
>>>>>> ii  glusterfs-server 3.4.0final-ubuntu1~precise1 clustered 
>>>>>> file-system (server package)
>>>>>>
>>>>>> step to reproduce the problem:
>>>>>> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and 
>>>>>> gfs12:/sdb1
>>>>>> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1
>>>>>> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1
>>>>>> 4. rebalance to make files distributed to all three pair of disks
>>>>>> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on 
>>>>>> ***/sdc1*** are migrating out
>>>>>> 6. remove-brick commit led to data loss in gfs_v0
>>>>>>
>>>>>> If between step 5 and 6 I initiate a remove-brick targeting 
>>>>>> /sdc1, then after commit I would not lose anything since all data 
>>>>>> will be migrated back to /sdb1.
>>>>>>
>>>>>
>>>>> You should ensure  that a 'remove-brick  start ' has completed and 
>>>>> then commit it before initiating the second one. The correct way 
>>>>> to do this would be:
>>>>> 5.   # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 
>>>>> start
>>>>> 6. Check that the data migration has been completed using the 
>>>>> status command:
>>>>>       # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 
>>>>> status
>>>>> 7.   #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 
>>>>> commit
>>>>> 8.   # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 
>>>>> start
>>>>> 9.   # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 
>>>>> status
>>>>> 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 
>>>>> commit
>>>>>
>>>>> This would leave you with the original replica 2 volume that you 
>>>>> had begun with. Hope this helps.
>>>>>
>>>>> Note:
>>>>> The latest version of glusterfs has the check that prevents a 
>>>>> second remove-brick operation until the first one has been committed.
>>>>> (You would receive a message thus : "volume remove-brick start: 
>>>>> failed: An earlier remove-brick task exists for volume <volname>. 
>>>>> Either commit it or stop it before starting a new task." )
>>>>>
>>>>> -Ravi
>>>>>
>>>>>
>>>>>> -C.B.
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>