[Bugs] [Bug 1310972] New: After GlusterD restart, Remove-brick commit happening even though data migration not completed.
bugzilla at redhat.com
bugzilla at redhat.com
Tue Feb 23 05:42:57 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1310972
Bug ID: 1310972
Summary: After GlusterD restart, Remove-brick commit happening
even though data migration not completed.
Product: GlusterFS
Version: 3.7.8
Component: glusterd
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: amukherj at redhat.com
CC: bsrirama at redhat.com, bugs at gluster.org,
storage-qa-internal at redhat.com
Depends On: 1303028, 1303125, 1303269
+++ This bug was initially created as a clone of Bug #1303269 +++
+++ This bug was initially created as a clone of Bug #1303125 +++
Description of problem:
=======================
Have two node cluster with Distributed-Replica volume and mounted as fuse with
enough data and started removing replica brick set which triggered rebalance,
during rebalance in progress, restarted glusterd on a node from where data
migration is happening, after that tried to commit the remove-brick, it's get
committed even though data migration not completed.
Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-17
How reproducible:
=================
Every time
Steps to Reproduce:
====================
1.Have a two node cluster with Distributed-Replica volume (2 *2 )
2.Mount the volume as Fuse and write enough data
3.Start replica brick set remove // will trigger the data migration
4.Using remove-brick status identify brick node from where data migration is
happening.
5. Restart glusterd on the node identified in step-4 during rebalance in
progress
6.Try to commit the remove-brick //commit will happen with out fail.
Actual results:
===============
remove-brick commit happens even though rebalance not completed.
Expected results:
=================
remove-brick commit should not happen when rebalance is in progress.
Additional info:
--- Additional comment from Byreddy on 2016-01-29 10:55:45 EST ---
[root at dhcp42-84 ~]# gluster volume status
Status of volume: Dis-Rep
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.42.84:/bricks/brick0/smp0 49157 0 Y 18500
Brick 10.70.43.6:/bricks/brick0/smp1 49162 0 Y 19368
Brick 10.70.42.84:/bricks/brick1/smp2 49158 0 Y 18519
Brick 10.70.43.6:/bricks/brick1/smp3 49163 0 Y 19387
NFS Server on localhost 2049 0 Y 18541
Self-heal Daemon on localhost N/A N/A Y 18546
NFS Server on 10.70.43.6 2049 0 Y 19409
Self-heal Daemon on 10.70.43.6 N/A N/A Y 19414
Task Status of Volume Dis-Rep
------------------------------------------------------------------------------
There are no active volume tasks
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster peer status
Number of Peers: 1
Hostname: 10.70.43.6
Uuid: 2f8a267c-7e7c-488f-98b9-f816062aae58
State: Peer in Cluster (Connected)
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2
10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 start
volume remove-brick start: success
ID: fd0164f8-2cba-4b25-b881-bbeb7b323695
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2
10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
localhost 59 351.4KB
417 0 0 in progress 7.00
10.70.43.6 0 0Bytes
0 0 0 in progress 7.00
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2
10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
localhost 93 511.0KB
627 0 0 in progress 11.00
10.70.43.6 0 0Bytes
0 0 0 in progress 11.00
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2
10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
localhost 113 569.2KB
710 0 0 in progress 13.00
10.70.43.6 0 0Bytes
0 0 0 completed 12.00
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2
10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: failed: use 'force' option as migration is in
progress
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# systemctl restart glusterd
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2
10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
localhost 0 0Bytes
0 0 0 in progress 0.00
10.70.43.6 0 0Bytes
0 0 0 completed 12.00
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2
10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount
point before re-purposing the removed brick.
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]#
[root at dhcp42-84 ~]# gluster volume status
Status of volume: Dis-Rep
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.42.84:/bricks/brick0/smp0 49157 0 Y 18500
Brick 10.70.43.6:/bricks/brick0/smp1 49162 0 Y 19368
NFS Server on localhost 2049 0 Y 19014
Self-heal Daemon on localhost N/A N/A Y 19022
NFS Server on 10.70.43.6 2049 0 Y 19582
Self-heal Daemon on 10.70.43.6 N/A N/A Y 19590
Task Status of Volume Dis-Rep
------------------------------------------------------------------------------
There are no active volume tasks
--- Additional comment from Vijay Bellur on 2016-01-29 22:21:00 EST ---
REVIEW: http://review.gluster.org/13323 (glusterd: set
decommission_is_in_progress flag for inprogress remove-brick op on glusterd
restart) posted (#1) for review on master by Atin Mukherjee
(amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-01-30 01:50:12 EST ---
REVIEW: http://review.gluster.org/13323 (glusterd: set
decommission_is_in_progress flag for inprogress remove-brick op on glusterd
restart) posted (#2) for review on master by Atin Mukherjee
(amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-02-01 23:34:07 EST ---
REVIEW: http://review.gluster.org/13323 (glusterd: set
decommission_is_in_progress flag for inprogress remove-brick op on glusterd
restart) posted (#3) for review on master by Atin Mukherjee
(amukherj at redhat.com)
--- Additional comment from Vijay Bellur on 2016-02-23 00:42:34 EST ---
COMMIT: http://review.gluster.org/13323 committed in master by Atin Mukherjee
(amukherj at redhat.com)
------
commit 3ca140f011faa9d92a4b3889607fefa33ae6de76
Author: Atin Mukherjee <amukherj at redhat.com>
Date: Sat Jan 30 08:47:35 2016 +0530
glusterd: set decommission_is_in_progress flag for inprogress remove-brick
op on glusterd restart
While remove brick is in progress, if glusterd is restarted since
decommission
flag is not persisted in the store the same value is not retained back
resulting
in glusterd not blocking remove brick commit when rebalance is already in
progress.
Change-Id: Ibbf12f3792d65ab1293fad1e368568be141a1cd6
BUG: 1303269
Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
Reviewed-on: http://review.gluster.org/13323
Smoke: Gluster Build System <jenkins at build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Gaurav Kumar Garg <ggarg at redhat.com>
Reviewed-by: mohammed rafi kc <rkavunga at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1303028
[Bug 1303028] Tiering status and rebalance status stops getting updated
https://bugzilla.redhat.com/show_bug.cgi?id=1303125
[Bug 1303125] After GlusterD restart, Remove-brick commit happening even
though data migration not completed.
https://bugzilla.redhat.com/show_bug.cgi?id=1303269
[Bug 1303269] After GlusterD restart, Remove-brick commit happening even
though data migration not completed.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list