[Gluster-users] remove-brick removed unexpected bricks
Ravishankar N
ravishankar at redhat.com
Tue Aug 13 04:51:34 UTC 2013
On 08/13/2013 03:43 AM, Cool wrote:
> remove-brick in 3.4.0 seems removing wrong bricks, can someone help to
> review the environment/steps to see if I did anything stupid?
>
> setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages
> from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1:
> ii glusterfs-client 3.4.0final-ubuntu1~precise1
> clustered file-system (client package)
> ii glusterfs-common 3.4.0final-ubuntu1~precise1
> GlusterFS common libraries and translator modules
> ii glusterfs-server 3.4.0final-ubuntu1~precise1
> clustered file-system (server package)
>
> step to reproduce the problem:
> 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1
> 2. add-brick gfs11:/sdc1 and gfs12:/sdc1
> 3. add-brick gfs11:/sdd1 and gfs12:/sdd1
> 4. rebalance to make files distributed to all three pair of disks
> 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on
> ***/sdc1*** are migrating out
> 6. remove-brick commit led to data loss in gfs_v0
>
> If between step 5 and 6 I initiate a remove-brick targeting /sdc1,
> then after commit I would not lose anything since all data will be
> migrated back to /sdb1.
>
You should ensure that a 'remove-brick start ' has completed and then
commit it before initiating the second one. The correct way to do this
would be:
5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start
6. Check that the data migration has been completed using the status
command:
# gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 status
7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit
8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start
9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status
10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit
This would leave you with the original replica 2 volume that you had
begun with. Hope this helps.
Note:
The latest version of glusterfs has the check that prevents a second
remove-brick operation until the first one has been committed.
(You would receive a message thus : "volume remove-brick start: failed:
An earlier remove-brick task exists for volume <volname>. Either commit
it or stop it before starting a new task." )
-Ravi
> -C.B.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list