[Bugs] [Bug 1271725] New: Data Tiering: Disallow attach tier on a volume where any rebalance process is in progress to avoid deadlock(like remove brick commit pending etc)
bugzilla at redhat.com
bugzilla at redhat.com
Wed Oct 14 14:37:20 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1271725
Bug ID: 1271725
Summary: Data Tiering: Disallow attach tier on a volume where
any rebalance process is in progress to avoid
deadlock(like remove brick commit pending etc)
Product: Red Hat Gluster Storage
Version: 3.1
Component: glusterfs
Sub Component: tiering
Keywords: Triaged
Severity: urgent
Priority: urgent
Assignee: rhs-bugs at redhat.com
Reporter: rkavunga at redhat.com
QA Contact: nchilaka at redhat.com
CC: bugs at gluster.org, dlambrig at redhat.com,
nchilaka at redhat.com, rkavunga at redhat.com,
vagarwal at redhat.com
Depends On: 1258833
Blocks: 1260923, 1261819
+++ This bug was initially created as a clone of Bug #1258833 +++
Description of problem:
=====================
When attaching a tier make a check to see if any rebalance operations are
pending.
For example, I had a remove-brick operation completed, but commit was not yet
done.
Now I was able to attach tier.
Here There is a deadlock created as the tier deamon doesnt start by itself on
attach tier as the remove brick is not commited, nor can i do a commit of
remove-brick as it is a tier volume.
So, make sure you add a check before going ahead of attach-tier
Version-Release number of selected component (if applicable):
=============================================================
[root at nag-manual-node1 glusterfs]# gluster --version
glusterfs 3.7.3 built on Aug 27 2015 01:23:05
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.
[root at nag-manual-node1 glusterfs]# rpm -qa|grep gluster
glusterfs-libs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-fuse-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-server-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-api-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-cli-3.7.3-0.82.git6c4096f.el6.x86_64
python-gluster-3.7.3-0.82.git6c4096f.el6.noarch
glusterfs-client-xlators-3.7.3-0.82.git6c4096f.el6.x86_64
How reproducible:
====================
very easily
Steps to Reproduce:
===================
1.create a distribute vol with say 4 bricks
2.now issue a remove brick and wait for it to complete
3.Now without commiting the remove brick, go ahead and attach tier
4. Now due to this the tier deamon doesnt trigger as commit is pending
Nor can i commit the remove brick due to it being a tier vol. Hence deadlock
Expected results:
===================
disallow attach tier if there are any rebalance operations are pending.
CLI LOG:
=======
[root at nag-manual-node1 glusterfs]# gluster v create rebal
10.70.46.84:/rhs/brick1/rebal 10.70.46.36:/rhs/brick1/rebal
10.70.46.36:/rhs/brick2/rebal
volume create: rebal: success: please start the volume to access data
[root at nag-manual-node1 glusterfs]# gluster v start rebal
volume start: rebal: success
[root at nag-manual-node1 glusterfs]# gluster v info rebal
Volume Name: rebal
Type: Distribute
Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.84:/rhs/brick1/rebal
Brick2: 10.70.46.36:/rhs/brick1/rebal
Brick3: 10.70.46.36:/rhs/brick2/rebal
Options Reconfigured:
performance.readdir-ahead: on
[root at nag-manual-node1 glusterfs]# gluster v remove-brick rebal
10.70.46.36:/rhs/brick2/rebal start
volume remove-brick start: success
ID: 464ee968-e3a4-41f0-89f7-6d6ec4ea1a62
[root at nag-manual-node1 glusterfs]# gluster v remove-brick rebal
10.70.46.36:/rhs/brick2/rebal status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.46.36 0 0Bytes
0 0 0 completed 0.00
[root at nag-manual-node1 glusterfs]# gluster v status rebal
Status of volume: rebal
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.46.84:/rhs/brick1/rebal 49187 0 Y 7849
Brick 10.70.46.36:/rhs/brick1/rebal 49186 0 Y 32414
Brick 10.70.46.36:/rhs/brick2/rebal 49187 0 Y 32432
NFS Server on localhost 2049 0 Y 7972
NFS Server on 10.70.46.36 2049 0 Y 32452
Task Status of Volume rebal
------------------------------------------------------------------------------
Task : Remove brick
ID : 464ee968-e3a4-41f0-89f7-6d6ec4ea1a62
Removed bricks:
10.70.46.36:/rhs/brick2/rebal
Status : completed
[root at nag-manual-node1 glusterfs]# gluster v info rebal
Volume Name: rebal
Type: Distribute
Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.84:/rhs/brick1/rebal
Brick2: 10.70.46.36:/rhs/brick1/rebal
Brick3: 10.70.46.36:/rhs/brick2/rebal
Options Reconfigured:
performance.readdir-ahead: on
[root at nag-manual-node1 glusterfs]# gluster v attach-tier rebal
10.70.46.84:/rhs/brick4/rebalhot 10.70.46.36:/rhs/brick4/rebalhot
Attach tier is recommended only for testing purposes in this release. Do you
want to continue? (y/n) y
volume attach-tier: success
volume rebalance: rebal: failed: A remove-brick task on volume rebal is not yet
committed. Either commit or stop the remove-brick task.
Failed to run tier start. Please execute tier start command explictly
Usage : gluster volume rebalance <volname> tier start
[root at nag-manual-node1 glusterfs]# gluster v info rebal
Volume Name: rebal
Type: Tier
Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee
Status: Started
Number of Bricks: 5
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 2
Brick1: 10.70.46.36:/rhs/brick4/rebalhot
Brick2: 10.70.46.84:/rhs/brick4/rebalhot
Cold Tier:
Cold Tier Type : Distribute
Number of Bricks: 3
Brick3: 10.70.46.84:/rhs/brick1/rebal
Brick4: 10.70.46.36:/rhs/brick1/rebal
Brick5: 10.70.46.36:/rhs/brick2/rebal
Options Reconfigured:
performance.readdir-ahead: on
[root at nag-manual-node1 glusterfs]# gluster v status rebal
Status of volume: rebal
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.46.36:/rhs/brick4/rebalhot 49188 0 Y 32571
Brick 10.70.46.84:/rhs/brick4/rebalhot 49188 0 Y 8027
Cold Bricks:
Brick 10.70.46.84:/rhs/brick1/rebal 49187 0 Y 7849
Brick 10.70.46.36:/rhs/brick1/rebal 49186 0 Y 32414
Brick 10.70.46.36:/rhs/brick2/rebal 49187 0 Y 32432
NFS Server on localhost 2049 0 Y 8047
NFS Server on 10.70.46.36 2049 0 Y 32590
Task Status of Volume rebal
------------------------------------------------------------------------------
Task : Remove brick
ID : 464ee968-e3a4-41f0-89f7-6d6ec4ea1a62
Removed bricks:
10.70.46.36:/rhs/brick2/rebal
Status : completed
[root at nag-manual-node1 glusterfs]# gluster v rebal rebal status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.46.36 0 0Bytes
0 0 0 completed 0.00
volume rebalance: rebal: success:
[root at nag-manual-node1 glusterfs]# gluster v rebal rebal tier status
Node Promoted files Demoted files Status
--------- --------- --------- ---------
localhost 0 0 not started
10.70.46.36 0 0 completed
root at nag-manual-node1 glusterfs]# gluster v rebalance rebal tier start
volume rebalance: rebal: failed: A remove-brick task on volume rebal is not yet
committed. Either commit or stop the remove-brick task.
[root at nag-manual-node1 glusterfs]# gluster v rebalance rebal tier status
[root at nag-manual-node1 glusterfs]# gluster v remove-brick rebal
10.70.46.36:/rhs/brick2/rebal commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: failed: Removing brick from a Tier volume is not
allowed
--- Additional comment from nchilaka on 2015-09-01 08:07:20 EDT ---
Workaround:
==========
>do a detach tier commit forcefully
>do a remove brick commit forcefully(though the remove brick operation doesnt show up anymore in the vol status or rebalance status
>reattach the tier
[root at nag-manual-node1 glusterfs]# gluster v detach-tier rebal commit
volume detach-tier commit: failed: Brick 10.70.46.84:/rhs/brick4/rebalhot is
not decommissioned. Use start or force option
[root at nag-manual-node1 glusterfs]# gluster v detach-tier rebal commit force
volume detach-tier commit force: success
[root at nag-manual-node1 glusterfs]# gluster v info rebal
Volume Name: rebal
Type: Distribute
Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.84:/rhs/brick1/rebal
Brick2: 10.70.46.36:/rhs/brick1/rebal
Brick3: 10.70.46.36:/rhs/brick2/rebal
Options Reconfigured:
performance.readdir-ahead: on
[root at nag-manual-node1 glusterfs]# gluster v status rebal
Status of volume: rebal
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.46.84:/rhs/brick1/rebal 49187 0 Y 7849
Brick 10.70.46.36:/rhs/brick1/rebal 49186 0 Y 32414
Brick 10.70.46.36:/rhs/brick2/rebal 49187 0 Y 32432
NFS Server on localhost 2049 0 Y 8455
NFS Server on 10.70.46.36 2049 0 Y 402
Task Status of Volume rebal
------------------------------------------------------------------------------
There are no active volume tasks
[root at nag-manual-node1 glusterfs]# gluster v rebal rebal status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
volume rebalance: rebal: success:
[root at nag-manual-node1 glusterfs]# gluster v remove-brick rebal
10.70.46.36:/rhs/brick2/rebal commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount
point before re-purposing the removed brick.
--- Additional comment from Mohammed Rafi KC on 2015-09-10 04:50:15 EDT ---
Nag,
Thanks for catching this bug. Good work
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1258833
[Bug 1258833] Data Tiering: Disallow attach tier on a volume where any
rebalance process is in progress to avoid deadlock(like remove brick commit
pending etc)
https://bugzilla.redhat.com/show_bug.cgi?id=1260923
[Bug 1260923] Tracker for tiering in 3.1.2
https://bugzilla.redhat.com/show_bug.cgi?id=1261819
[Bug 1261819] Data Tiering: Disallow attach tier on a volume where any
rebalance process is in progress to avoid deadlock(like remove brick commit
pending etc)
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=MJuuqZjvxf&a=cc_unsubscribe
More information about the Bugs
mailing list