[Bugs] [Bug 1266880] New: Tiering: unlink failed with error "Invaid argument"
bugzilla at redhat.com
bugzilla at redhat.com
Mon Sep 28 11:19:30 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1266880
Bug ID: 1266880
Summary: Tiering: unlink failed with error "Invaid argument"
Product: GlusterFS
Version: 3.7.5
Component: tiering
Keywords: Triaged
Severity: urgent
Priority: urgent
Assignee: bugs at gluster.org
Reporter: rkavunga at redhat.com
QA Contact: bugs at gluster.org
CC: bugs at gluster.org, josferna at redhat.com,
rgowdapp at redhat.com
Depends On: 1236032
Blocks: 1260923
+++ This bug was initially created as a clone of Bug #1236032 +++
Description of problem:
Unlink operation failed after attaching tier to a volume that contain some
files/directory
Version-Release number of selected component (if applicable):
master
How reproducible:
100%
Steps to Reproduce:
1.create a distributed volume
2.start and create some files
3.attach a tier. (enable ctr, etc)
4.Remove some files from the mount point
Actual results:
failed with Invalid argument
Expected results:
unlink should success
Additional info:
unlink failed because dht translator doesn't have any cached subvolume for that
particular inode in the inode ctx variable.
--- Additional comment from Mohammed Rafi KC on 2015-06-30 05:39:51 EDT ---
Changing the steps to reproduce :
Steps to Reproduce:
1.create a distributed volume
2.start and create some files
3.attach a tier. (enable ctr, etc)
4.do ls on mount point
4.Remove some files from the mount point
--- Additional comment from Joseph Elwin Fernandes on 2015-07-06 03:35:47 EDT
---
This issue is due to the NULL cached_subvolume in hot-dht xlator after tiering
translator. Had a discussion with Dan on this he said he has a fix for this as
he has dealt with this issue for other FOPS. This issue also happens for
getxattr "trusted.distribute.linkinfo".
BT when break point at : dht_unlink_linkfile_cbk
(gdb) bt
#0 dht_unlink_linkfile_cbk (frame=0x7fddb40086dc, cookie=0x7fddb400796c,
this=0x7fddbc0162d0, op_ret=-1,
op_errno=22, preparent=0x0, postparent=0x0, xdata=0x0) at dht-common.c:2403
#1 0x00007fddc9598b5a in dht_unlink (frame=0x7fddb400796c,
this=0x7fddbc015510, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at dht-common.c:5208
#2 0x00007fddc9598798 in dht_unlink (frame=0x7fddb40086dc,
this=0x7fddbc0162d0, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at dht-common.c:5196
#3 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc,
this=0x7fddbc017b70, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at defaults.c:1910
#4 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc,
this=0x7fddbc0189b0, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at defaults.c:1910
#5 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc,
this=0x7fddbc019720, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at defaults.c:1910
#6 0x00007fddd6af5364 in default_unlink_resume (frame=0x7fddb40058ec,
this=0x7fddbc01a550, loc=0x7fddb400625c,
xflag=0, xdata=0x0) at defaults.c:1469
#7 0x00007fddd6b16817 in call_resume_wind (stub=0x7fddb400621c) at
call-stub.c:2083
#8 0x00007fddd6b1ef1e in call_resume (stub=0x7fddb400621c) at call-stub.c:2571
#9 0x00007fddc8d21a58 in open_and_resume (this=0x7fddbc01a550, fd=0x0,
stub=0x7fddb400621c) at open-behind.c:242
#10 0x00007fddc8d2468f in ob_unlink (frame=0x7fddb40058ec, this=0x7fddbc01a550,
loc=0x7fddb40016e0, xflags=0,
xdata=0x0) at open-behind.c:768
#11 0x00007fddc8b12c6b in mdc_unlink (frame=0x7fddb40056fc,
this=0x7fddbc01b310, loc=0x7fddb40016e0, xflag=0,
xdata=0x0) at md-cache.c:1205
#12 0x00007fddc88fda67 in io_stats_unlink (frame=0x7fddb40055fc,
this=0x7fddbc01c0d0, loc=0x7fddb40016e0, xflag=0,
xdata=0x0) at io-stats.c:2002
#13 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40055fc,
this=0x7fddbc01d170, loc=0x7fddb40016e0, xflag=0,
xdata=0x0) at defaults.c:1910
#14 0x00007fddcdef230b in fuse_unlink_resume (state=0x7fddb40016c0) at
fuse-bridge.c:1568
#15 0x00007fddcdeebe47 in fuse_fop_resume (state=0x7fddb40016c0) at
fuse-bridge.c:536
#16 0x00007fddcdee9b49 in fuse_resolve_done (state=0x7fddb40016c0) at
fuse-resolve.c:637
#17 0x00007fddcdee9c1f in fuse_resolve_all (state=0x7fddb40016c0) at
fuse-resolve.c:664
#18 0x00007fddcdee9b2a in fuse_resolve (state=0x7fddb40016c0) at
fuse-resolve.c:628
#19 0x00007fddcdee9bf6 in fuse_resolve_all (state=0x7fddb40016c0) at
fuse-resolve.c:660
#20 0x00007fddcdee9c7d in fuse_resolve_continue (state=0x7fddb40016c0) at
fuse-resolve.c:680
#21 0x00007fddcdee9041 in fuse_resolve_parent (state=0x7fddb40016c0) at
fuse-resolve.c:290
#22 0x00007fddcdee9afa in fuse_resolve (state=0x7fddb40016c0) at
fuse-resolve.c:621
#23 0x00007fddcdee9ba1 in fuse_resolve_all (state=0x7fddb40016c0) at
fuse-resolve.c:653
#24 0x00007fddcdee9cbb in fuse_resolve_and_resume (state=0x7fddb40016c0,
fn=0x7fddcdef1e94 <fuse_unlink_resume>)
at fuse-resolve.c:692
#25 0x00007fddcdef240c in fuse_unlink (this=0x20a0be0, finh=0x7fddb4008d30,
msg=0x7fddb4008d58)
at fuse-bridge.c:1582
#26 0x00007fddcdf02087 in fuse_thread_proc (data=0x20a0be0) at
fuse-bridge.c:4879
#27 0x00007fddd593652a in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#28 0x00007fddd528522d in clone () from /lib64/libc.so.6
Point to be noted here is cached_subvol is NULL and hence op_errno is set to
EINVAL
#1 0x00007fddc9598b5a in dht_unlink (frame=0x7fddb400796c,
this=0x7fddbc015510, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at dht-common.c:5208
5208 DHT_STACK_UNWIND (unlink, frame, -1, op_errno, NULL, NULL,
NULL);
(gdb) p hashed_subvol
$8 = (xlator_t *) 0x7fddbc012f70
(gdb) p cached_subvol
$3 = (xlator_t *) 0x0
(gdb) p op_errno
$4 = 22
(gdb) p local
$6 = (dht_local_t *) 0x7fddb4008e5c
(gdb) p local->cached_subvol
$7 = (xlator_t *) 0x0
(gdb)
(gdb) p this->name
$9 = 0x7fddbc00c6a0 "test-hot-dht"
(gdb)
dht_local_init (frame, loc, NULL, GF_FOP_UNLINK); in line 5170 of dht-common.c
has failed to populate cached_subvol. dht_subvol_get_cached() seems to be
broken for hot-dht xaltor.
when looked into
dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c) at
dht-helper.c:626
626 dht_layout_t *layout = NULL;
(gdb) bt
#0 dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c) at
dht-helper.c:626
#1 0x00007fddc955ae7d in dht_local_init (frame=0x7fddb400751c,
loc=0x7fddb400654c, fd=0x0, fop=GF_FOP_UNLINK)
at dht-helper.c:498
#2 0x00007fddc95984b3 in dht_unlink (frame=0x7fddb400751c,
this=0x7fddbc015510, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at dht-common.c:5170
#3 0x00007fddc9598798 in dht_unlink (frame=0x7fddb400741c,
this=0x7fddbc0162d0, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at dht-common.c:5196
#4 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c,
this=0x7fddbc017b70, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at defaults.c:1910
#5 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c,
this=0x7fddbc0189b0, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at defaults.c:1910
#6 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c,
this=0x7fddbc019720, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at defaults.c:1910
#7 0x00007fddd6af5364 in default_unlink_resume (frame=0x7fddb400640c,
this=0x7fddbc01a550, loc=0x7fddb400654c
627 xlator_t *subvol = NULL;
(gdb)
629 GF_VALIDATE_OR_GOTO (this->name, this, out);
(gdb)
630 GF_VALIDATE_OR_GOTO (this->name, inode, out);
(gdb)
632 layout = dht_layout_get (this, inode);
(gdb) n
634 if (!layout) {
(gdb) p layout
$18 = (dht_layout_t *) 0x0
(gdb) p this->name
$19 = 0x7fddbc00c6a0 "test-hot-dht"
(gdb)
dht_layout_get return NULL. As a result dht_subvol_get_cached also return NULL.
When looked deeper we see dht_inode_ctx_t is NULL!
Breakpoint 1, dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c)
at dht-helper.c:626
626 dht_layout_t *layout = NULL;
(gdb) n
627 xlator_t *subvol = NULL;
(gdb) n
629 GF_VALIDATE_OR_GOTO (this->name, this, out);
(gdb) n
630 GF_VALIDATE_OR_GOTO (this->name, inode, out);
(gdb) n
632 layout = dht_layout_get (this, inode);
(gdb) s
dht_layout_get (this=0x7fddbc015510, inode=0x7fddbc03b70c) at dht-layout.c:65
65 dht_conf_t *conf = NULL;
(gdb) n
66 dht_layout_t *layout = NULL;
(gdb) n
67 int ret = 0;
(gdb) n
69 conf = this->private;
(gdb) n
70 if (!conf)
(gdb) n
73 LOCK (&conf->layout_lock);
(gdb) n
75 ret = dht_inode_ctx_layout_get (inode, this, &layout);
(gdb) s
dht_inode_ctx_layout_get (inode=0x7fddbc03b70c, this=0x7fddbc015510,
layout=0x7fddc37fd678) at dht-common.c:6981
6981 dht_inode_ctx_t *ctx = NULL;
(gdb) n
6982 int ret = -1;
(gdb) n
6984 ret = dht_inode_ctx_get (inode, this, &ctx);
(gdb) n
6986 if (!ret && ctx) {
(gdb) p ctx
$10 = (dht_inode_ctx_t *) 0x0
(gdb)
--- Additional comment from Joseph Elwin Fernandes on 2015-07-06 03:42:27 EDT
---
This issue only happens in a pure distribute case! not on a Dis-rep or Dis-EC.
--- Additional comment from Mohammed Rafi KC on 2015-07-10 05:48:16 EDT ---
RCA.
Since, all of the fops will be hashed to hot_tier after attach-tier (unless
explicitly set the "rule" option), the lookups sent to directory, will
eventually search the directory using readdirp, and will populate inode_ctx for
the inodes based on the output, in respective dht_xlators. So the readdirp will
populate inodes_ctx for the files (that is already being present in volume
before attaching) in cold-dht, only because it got the entries from the
cold-tier.
So when an unlink comes on such an inode, the lookup associated with the unlink
will be send as a re validate request to cold-tier only, since already a lookup
was performed on the inode, and the new lookup will succeed. So from the unlink
of dht, it will hash to cold-tier but the cached_subvol will be cold, since
there is a mismatch in hash and cach , it chose hashed subvolume and will sent
the fop to hot dht, and the fops fail with EINVAL from the hot-dht since it
does not have inode_ctx stored for that inode (because, no lookup was performed
from hot-dht).
--- Additional comment from Mohammed Rafi KC on 2015-07-10 07:22:43 EDT ---
The same problem could be there for the following FOP's too.
1) dht_link,
2) getxattr "trusted.distribute.linkinfo"
3) f/setxattr
4) f/removexattr
5) unlink of a link file
--- Additional comment from Anand Avati on 2015-07-21 08:53:46 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-06 02:48:17 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#3) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-13 07:33:06 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#5) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-13 16:04:40 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#6) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-13 16:44:45 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#7) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-14 02:35:40 EDT ---
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before
actual fop) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-14 10:55:00 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#8) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-19 01:43:03 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#9) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-21 06:55:57 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#10) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-21 11:51:12 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#11) for review on master by Joseph Fernandes
--- Additional comment from Anand Avati on 2015-08-27 13:14:18 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#12) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-03 05:37:12 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#13) for review on master by Joseph Fernandes
--- Additional comment from Vijay Bellur on 2015-09-04 05:09:25 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#14) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-09 05:13:55 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#16) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-12 10:23:15 EDT ---
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before
actual fop) posted (#3) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-14 04:53:43 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#17) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-15 01:39:21 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#18) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1236032
[Bug 1236032] Tiering: unlink failed with error "Invaid argument"
https://bugzilla.redhat.com/show_bug.cgi?id=1260923
[Bug 1260923] Tracker for tiering in 3.1.2
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list