[Bugs] [Bug 1293300] New: [Tiering] + [DHT] - Detach tier fails to migrate the files when there are corrupted objects in hot tier.
bugzilla at redhat.com
bugzilla at redhat.com
Mon Dec 21 11:20:52 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1293300
Bug ID: 1293300
Summary: [Tiering] + [DHT] - Detach tier fails to migrate the
files when there are corrupted objects in hot tier.
Product: GlusterFS
Version: 3.7.6
Component: replicate
Severity: high
Assignee: bugs at gluster.org
Reporter: ravishankar at redhat.com
CC: asrivast at redhat.com, bugs at gluster.org,
dlambrig at redhat.com, gluster-bugs at redhat.com,
josferna at redhat.com, knarra at redhat.com,
nbalacha at redhat.com, nchilaka at redhat.com,
vagarwal at redhat.com
Depends On: 1289228, 1290965
+++ This bug was initially created as a clone of Bug #1290965 +++
+++ This bug was initially created as a clone of Bug #1289228 +++
Description of problem:
When there are corrupted objects in hot tier, running detach tier on the volume
fails to migrate the files.Detach tier should display a message saying there
are some corrupted files, please recover them before performing detach tier.
When there is a corrupted file in one of the subvolume in replica pair in hot
tier and another subvolume has a good copy, detach tier fails to migrate the
good files to cold tier.Detach tier should migrate the files since there is a
good copy of the file.
Version-Release number of selected component (if applicable):
glusterfs-3.7.5-9.el7rhgs.x86_64
How reproducible:
Always
Steps to Reproduce:
1. Create a tiered volume with both hot and cold tier as distribute replicate.
2. Mount the volume using NFS and create some data.
3. Edit the files from backend and wait for scrubber to mark it as corrupted
files.
4. Now run the command 'gluster volume detach tier start' to detach the hot
tier.
Actual results:
Detach tier does not demote the files to cold tier or does not complain
anything about corrupted files. This will lead to data loss if user removes the
tier by committing it.
Expected results:
Detach tier should complain about corrupted files or it should migrate the
files since there is a good copy available in the other subvolume of replica
pair.
Additional info:
--- Additional comment from RamaKasturi on 2015-12-07 12:11:11 EST ---
gluster vol info ouput:
===========================
[root at rhs-client2 ~]# gluster vol info vol1
Volume Name: vol1
Type: Tier
Volume ID: 385fdb1e-1034-40ca-9a14-e892e68b500b
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: rhs-client38:/bricks/brick3/h4
Brick2: rhs-client2:/bricks/brick3/h3
Brick3: rhs-client38:/bricks/brick2/h2
Brick4: rhs-client2:/bricks/brick2/h1
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: rhs-client2:/bricks/brick0/c1
Brick6: rhs-client38:/bricks/brick0/c2
Brick7: rhs-client2:/bricks/brick1/c3
Brick8: rhs-client38:/bricks/brick1/c4
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
cluster.watermark-hi: 2
cluster.watermark-low: 1
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
Files inside the bricks before detach tier:
=================================================
[root at rhs-client2 ~]# ls -l /bricks/brick*/h*
/bricks/brick2/h1:
total 4
-rw-r--r--. 2 root root 75 Dec 7 12:17 ff1
/bricks/brick3/h3:
total 1361028
-rw-r--r--. 2 root root 76 Dec 7 12:17 ff2
-rw-r--r--. 2 root root 1393688576 Dec 7 12:18 rhgsc-appliance005
[root at rhs-client2 ~]# getfattr -d -m . -e hex /bricks/brick2/h1/ff1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick2/h1/ff1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.bad-file=0x3100
trusted.bit-rot.signature=0x0102000000000000006fde5302afacc9901c00de8ffc8c7aeaa4ea094d7edfe0e216b094d3877660da
trusted.bit-rot.version=0x02000000000000005665763d00014924
trusted.gfid=0xc00e4e43eb5849618f0a0f37501f7613
[root at rhs-client2 ~]# getfattr -d -m . -e hex /bricks/brick3/h3/ff2
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/h3/ff2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.bad-file=0x3100
trusted.bit-rot.signature=0x0102000000000000000afb51faf10aa5d634c55290d8a3b579d1935c80c3c1a3f7f92fd812239c5ef8
trusted.bit-rot.version=0x02000000000000005665763d0001f921
trusted.gfid=0x810ff1ae3dd84d1a8422afa1283d2a78
[root at rhs-client2 ~]# ls -l /bricks/brick*/c*
/bricks/brick0/c1:
total 2722064
---------T. 2 root root 0 Dec 7 12:14 ff2
-rw-r--r--. 2 root root 21 Dec 4 12:21 file2_hot
-rw-r--r--. 2 root root 8 Dec 4 07:42 file3
-rw-r--r--. 2 root root 8 Dec 4 07:42 file4
-rw-r--r--. 2 root root 67 Dec 7 11:40 file_demote1
-rw-r--r--. 2 root root 1393688576 Dec 7 09:36 rhgsc-appliance004
---------T. 2 root root 0 Dec 7 12:14 rhgsc-appliance005
-rw-r--r--. 2 root root 1393688576 Dec 7 09:28 rhgsc-appliance-03
/bricks/brick1/c3:
total 4083100
---------T. 2 root root 0 Dec 7 12:14 ff1
-rw-r--r--. 2 root root 237 Dec 7 09:18 file1
-rw-r--r--. 2 root root 201 Dec 7 09:12 file2
-rw-r--r--. 2 root root 51 Dec 4 12:22 file2_hot1
-rw-r--r--. 2 root root 8 Dec 4 07:42 file5
-rw-r--r--. 2 root root 8 Dec 4 07:42 file6
-rw-r--r--. 2 root root 77 Dec 7 11:40 file_demote
-rw-r--r--. 2 root root 19 Dec 7 10:58 file_demote2
-rw-r--r--. 2 root root 1393688576 Dec 7 09:31 rhgsc-appliance00
-rw-r--r--. 2 root root 1393688576 Dec 7 09:26 rhgsc-appliance-02
-rw-r--r--. 2 root root 1393688576 Dec 4 12:06 rhgsc-appliance03
gluster volume detach-tier status after starting it:
==========================================================
[root at rhs-client2 ~]# gluster volume detach-tier vol1 status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
localhost 0 0Bytes
19 0 0 completed 0.00
10.70.36.62 3 1.3GB
3 0 0 completed 42.00
Once the status says migration is completed, still bricks are seen in hot tier:
===============================================================================
[root at rhs-client2 ~]# ls -l /bricks/brick*/c*
/bricks/brick0/c1:
total 4083088
---------T. 2 root root 0 Dec 7 16:53 ff2
-rw-r--r--. 2 root root 21 Dec 4 12:21 file2_hot
-rw-r--r--. 2 root root 8 Dec 4 07:42 file3
-rw-r--r--. 2 root root 8 Dec 4 07:42 file4
-rw-r--r--. 2 root root 67 Dec 7 11:40 file_demote1
-rw-r--r--. 2 root root 1393688576 Dec 7 09:36 rhgsc-appliance004
-rw-r--r--. 2 root root 1393688576 Dec 7 12:17 rhgsc-appliance005
-rw-r--r--. 2 root root 1393688576 Dec 7 09:28 rhgsc-appliance-03
/bricks/brick1/c3:
total 4083100
---------T. 2 root root 0 Dec 7 16:53 ff1
-rw-r--r--. 2 root root 237 Dec 7 09:18 file1
-rw-r--r--. 2 root root 201 Dec 7 09:12 file2
-rw-r--r--. 2 root root 51 Dec 4 12:22 file2_hot1
-rw-r--r--. 2 root root 8 Dec 4 07:42 file5
-rw-r--r--. 2 root root 8 Dec 4 07:42 file6
-rw-r--r--. 2 root root 77 Dec 7 11:40 file_demote
-rw-r--r--. 2 root root 19 Dec 7 10:58 file_demote2
-rw-r--r--. 2 root root 1393688576 Dec 7 09:31 rhgsc-appliance00
-rw-r--r--. 2 root root 1393688576 Dec 7 09:26 rhgsc-appliance-02
-rw-r--r--. 2 root root 1393688576 Dec 4 12:06 rhgsc-appliance03
--- Additional comment from RamaKasturi on 2015-12-07 12:28:13 EST ---
--- Additional comment from Vijay Bellur on 2015-12-12 01:37:57 EST ---
REVIEW: http://review.gluster.org/12955 (afr: handle bad objects during lookup)
posted (#1) for review on master by Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-15 11:52:30 EST ---
REVIEW: http://review.gluster.org/12955 (afr: handle bad objects during lookup)
posted (#2) for review on master by Ravishankar N (ravishankar at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-15 11:59:36 EST ---
REVIEW: http://review.gluster.org/12955 (afr: handle bad objects during
lookup/inode_refresh) posted (#3) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-16 11:12:12 EST ---
REVIEW: http://review.gluster.org/12955 (afr: handle bad objects during
lookup/inode_refresh) posted (#4) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-18 01:14:51 EST ---
REVIEW: http://review.gluster.org/12955 (afr: handle bad objects during
lookup/inode_refresh) posted (#5) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-18 04:41:54 EST ---
REVIEW: http://review.gluster.org/12955 (afr: handle bad objects during
lookup/inode_refresh) posted (#6) for review on master by Ravishankar N
(ravishankar at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-20 23:05:27 EST ---
REVIEW: http://review.gluster.org/12955 (afr: handle bad objects during
lookup/inode_refresh) posted (#7) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Vijay Bellur on 2015-12-21 00:29:02 EST ---
COMMIT: http://review.gluster.org/12955 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit 2b7226f9d3470d8fe4c98c1fddb06e7f641e364d
Author: Ravishankar N <ravishankar at redhat.com>
Date: Sat Dec 12 11:49:20 2015 +0530
afr: handle bad objects during lookup/inode_refresh
If an object (file) is marked bad by bitrot, do not consider the brick
on which the object is present as a potential read subvolume for AFR
irrespective of the pending xattr values.
Also do not consider the brick containing the bad object while
performing afr_accuse_smallfiles(). Otherwise if the bad object's size
is bigger, we may end up considering that as the source.
Change-Id: I4abc68e51e5c43c5adfa56e1c00b46db22c88cf7
BUG: 1290965
Signed-off-by: Ravishankar N <ravishankar at redhat.com>
Reviewed-on: http://review.gluster.org/12955
Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1289228
[Bug 1289228] [Tiering] + [DHT] - Detach tier fails to migrate the files
when there are corrupted objects in hot tier.
https://bugzilla.redhat.com/show_bug.cgi?id=1290965
[Bug 1290965] [Tiering] + [DHT] - Detach tier fails to migrate the files
when there are corrupted objects in hot tier.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list