[Bugs] [Bug 1271731] New: Tiering: unlink failed with error "Invaid argument"
bugzilla at redhat.com
bugzilla at redhat.com
Wed Oct 14 14:43:43 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1271731
Bug ID: 1271731
Summary: Tiering: unlink failed with error "Invaid argument"
Product: Red Hat Gluster Storage
Version: 3.1
Component: glusterfs
Sub Component: tiering
Keywords: Triaged, ZStream
Severity: urgent
Priority: urgent
Assignee: rhs-bugs at redhat.com
Reporter: rkavunga at redhat.com
QA Contact: nchilaka at redhat.com
CC: bugs at gluster.org, dlambrig at redhat.com,
josferna at redhat.com, rgowdapp at redhat.com
Depends On: 1236032, 1266880
Blocks: 1260923
Keywords: ZStream
+++ This bug was initially created as a clone of Bug #1266880 +++
+++ This bug was initially created as a clone of Bug #1236032 +++
Description of problem:
Unlink operation failed after attaching tier to a volume that contain some
files/directory
Version-Release number of selected component (if applicable):
master
How reproducible:
100%
Steps to Reproduce:
1.create a distributed volume
2.start and create some files
3.attach a tier. (enable ctr, etc)
4.Remove some files from the mount point
Actual results:
failed with Invalid argument
Expected results:
unlink should success
Additional info:
unlink failed because dht translator doesn't have any cached subvolume for that
particular inode in the inode ctx variable.
--- Additional comment from Mohammed Rafi KC on 2015-06-30 05:39:51 EDT ---
Changing the steps to reproduce :
Steps to Reproduce:
1.create a distributed volume
2.start and create some files
3.attach a tier. (enable ctr, etc)
4.do ls on mount point
4.Remove some files from the mount point
--- Additional comment from Joseph Elwin Fernandes on 2015-07-06 03:35:47 EDT
---
This issue is due to the NULL cached_subvolume in hot-dht xlator after tiering
translator. Had a discussion with Dan on this he said he has a fix for this as
he has dealt with this issue for other FOPS. This issue also happens for
getxattr "trusted.distribute.linkinfo".
BT when break point at : dht_unlink_linkfile_cbk
(gdb) bt
#0 dht_unlink_linkfile_cbk (frame=0x7fddb40086dc, cookie=0x7fddb400796c,
this=0x7fddbc0162d0, op_ret=-1,
op_errno=22, preparent=0x0, postparent=0x0, xdata=0x0) at dht-common.c:2403
#1 0x00007fddc9598b5a in dht_unlink (frame=0x7fddb400796c,
this=0x7fddbc015510, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at dht-common.c:5208
#2 0x00007fddc9598798 in dht_unlink (frame=0x7fddb40086dc,
this=0x7fddbc0162d0, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at dht-common.c:5196
#3 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc,
this=0x7fddbc017b70, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at defaults.c:1910
#4 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc,
this=0x7fddbc0189b0, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at defaults.c:1910
#5 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40086dc,
this=0x7fddbc019720, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at defaults.c:1910
#6 0x00007fddd6af5364 in default_unlink_resume (frame=0x7fddb40058ec,
this=0x7fddbc01a550, loc=0x7fddb400625c,
xflag=0, xdata=0x0) at defaults.c:1469
#7 0x00007fddd6b16817 in call_resume_wind (stub=0x7fddb400621c) at
call-stub.c:2083
#8 0x00007fddd6b1ef1e in call_resume (stub=0x7fddb400621c) at call-stub.c:2571
#9 0x00007fddc8d21a58 in open_and_resume (this=0x7fddbc01a550, fd=0x0,
stub=0x7fddb400621c) at open-behind.c:242
#10 0x00007fddc8d2468f in ob_unlink (frame=0x7fddb40058ec, this=0x7fddbc01a550,
loc=0x7fddb40016e0, xflags=0,
xdata=0x0) at open-behind.c:768
#11 0x00007fddc8b12c6b in mdc_unlink (frame=0x7fddb40056fc,
this=0x7fddbc01b310, loc=0x7fddb40016e0, xflag=0,
xdata=0x0) at md-cache.c:1205
#12 0x00007fddc88fda67 in io_stats_unlink (frame=0x7fddb40055fc,
this=0x7fddbc01c0d0, loc=0x7fddb40016e0, xflag=0,
xdata=0x0) at io-stats.c:2002
#13 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb40055fc,
this=0x7fddbc01d170, loc=0x7fddb40016e0, xflag=0,
xdata=0x0) at defaults.c:1910
#14 0x00007fddcdef230b in fuse_unlink_resume (state=0x7fddb40016c0) at
fuse-bridge.c:1568
#15 0x00007fddcdeebe47 in fuse_fop_resume (state=0x7fddb40016c0) at
fuse-bridge.c:536
#16 0x00007fddcdee9b49 in fuse_resolve_done (state=0x7fddb40016c0) at
fuse-resolve.c:637
#17 0x00007fddcdee9c1f in fuse_resolve_all (state=0x7fddb40016c0) at
fuse-resolve.c:664
#18 0x00007fddcdee9b2a in fuse_resolve (state=0x7fddb40016c0) at
fuse-resolve.c:628
#19 0x00007fddcdee9bf6 in fuse_resolve_all (state=0x7fddb40016c0) at
fuse-resolve.c:660
#20 0x00007fddcdee9c7d in fuse_resolve_continue (state=0x7fddb40016c0) at
fuse-resolve.c:680
#21 0x00007fddcdee9041 in fuse_resolve_parent (state=0x7fddb40016c0) at
fuse-resolve.c:290
#22 0x00007fddcdee9afa in fuse_resolve (state=0x7fddb40016c0) at
fuse-resolve.c:621
#23 0x00007fddcdee9ba1 in fuse_resolve_all (state=0x7fddb40016c0) at
fuse-resolve.c:653
#24 0x00007fddcdee9cbb in fuse_resolve_and_resume (state=0x7fddb40016c0,
fn=0x7fddcdef1e94 <fuse_unlink_resume>)
at fuse-resolve.c:692
#25 0x00007fddcdef240c in fuse_unlink (this=0x20a0be0, finh=0x7fddb4008d30,
msg=0x7fddb4008d58)
at fuse-bridge.c:1582
#26 0x00007fddcdf02087 in fuse_thread_proc (data=0x20a0be0) at
fuse-bridge.c:4879
#27 0x00007fddd593652a in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#28 0x00007fddd528522d in clone () from /lib64/libc.so.6
Point to be noted here is cached_subvol is NULL and hence op_errno is set to
EINVAL
#1 0x00007fddc9598b5a in dht_unlink (frame=0x7fddb400796c,
this=0x7fddbc015510, loc=0x7fddb400625c, xflag=0,
xdata=0x0) at dht-common.c:5208
5208 DHT_STACK_UNWIND (unlink, frame, -1, op_errno, NULL, NULL,
NULL);
(gdb) p hashed_subvol
$8 = (xlator_t *) 0x7fddbc012f70
(gdb) p cached_subvol
$3 = (xlator_t *) 0x0
(gdb) p op_errno
$4 = 22
(gdb) p local
$6 = (dht_local_t *) 0x7fddb4008e5c
(gdb) p local->cached_subvol
$7 = (xlator_t *) 0x0
(gdb)
(gdb) p this->name
$9 = 0x7fddbc00c6a0 "test-hot-dht"
(gdb)
dht_local_init (frame, loc, NULL, GF_FOP_UNLINK); in line 5170 of dht-common.c
has failed to populate cached_subvol. dht_subvol_get_cached() seems to be
broken for hot-dht xaltor.
when looked into
dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c) at
dht-helper.c:626
626 dht_layout_t *layout = NULL;
(gdb) bt
#0 dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c) at
dht-helper.c:626
#1 0x00007fddc955ae7d in dht_local_init (frame=0x7fddb400751c,
loc=0x7fddb400654c, fd=0x0, fop=GF_FOP_UNLINK)
at dht-helper.c:498
#2 0x00007fddc95984b3 in dht_unlink (frame=0x7fddb400751c,
this=0x7fddbc015510, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at dht-common.c:5170
#3 0x00007fddc9598798 in dht_unlink (frame=0x7fddb400741c,
this=0x7fddbc0162d0, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at dht-common.c:5196
#4 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c,
this=0x7fddbc017b70, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at defaults.c:1910
#5 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c,
this=0x7fddbc0189b0, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at defaults.c:1910
#6 0x00007fddd6af96c9 in default_unlink (frame=0x7fddb400741c,
this=0x7fddbc019720, loc=0x7fddb400654c, xflag=0,
xdata=0x0) at defaults.c:1910
#7 0x00007fddd6af5364 in default_unlink_resume (frame=0x7fddb400640c,
this=0x7fddbc01a550, loc=0x7fddb400654c
627 xlator_t *subvol = NULL;
(gdb)
629 GF_VALIDATE_OR_GOTO (this->name, this, out);
(gdb)
630 GF_VALIDATE_OR_GOTO (this->name, inode, out);
(gdb)
632 layout = dht_layout_get (this, inode);
(gdb) n
634 if (!layout) {
(gdb) p layout
$18 = (dht_layout_t *) 0x0
(gdb) p this->name
$19 = 0x7fddbc00c6a0 "test-hot-dht"
(gdb)
dht_layout_get return NULL. As a result dht_subvol_get_cached also return NULL.
When looked deeper we see dht_inode_ctx_t is NULL!
Breakpoint 1, dht_subvol_get_cached (this=0x7fddbc015510, inode=0x7fddbc03b70c)
at dht-helper.c:626
626 dht_layout_t *layout = NULL;
(gdb) n
627 xlator_t *subvol = NULL;
(gdb) n
629 GF_VALIDATE_OR_GOTO (this->name, this, out);
(gdb) n
630 GF_VALIDATE_OR_GOTO (this->name, inode, out);
(gdb) n
632 layout = dht_layout_get (this, inode);
(gdb) s
dht_layout_get (this=0x7fddbc015510, inode=0x7fddbc03b70c) at dht-layout.c:65
65 dht_conf_t *conf = NULL;
(gdb) n
66 dht_layout_t *layout = NULL;
(gdb) n
67 int ret = 0;
(gdb) n
69 conf = this->private;
(gdb) n
70 if (!conf)
(gdb) n
73 LOCK (&conf->layout_lock);
(gdb) n
75 ret = dht_inode_ctx_layout_get (inode, this, &layout);
(gdb) s
dht_inode_ctx_layout_get (inode=0x7fddbc03b70c, this=0x7fddbc015510,
layout=0x7fddc37fd678) at dht-common.c:6981
6981 dht_inode_ctx_t *ctx = NULL;
(gdb) n
6982 int ret = -1;
(gdb) n
6984 ret = dht_inode_ctx_get (inode, this, &ctx);
(gdb) n
6986 if (!ret && ctx) {
(gdb) p ctx
$10 = (dht_inode_ctx_t *) 0x0
(gdb)
--- Additional comment from Joseph Elwin Fernandes on 2015-07-06 03:42:27 EDT
---
This issue only happens in a pure distribute case! not on a Dis-rep or Dis-EC.
--- Additional comment from Mohammed Rafi KC on 2015-07-10 05:48:16 EDT ---
RCA.
Since, all of the fops will be hashed to hot_tier after attach-tier (unless
explicitly set the "rule" option), the lookups sent to directory, will
eventually search the directory using readdirp, and will populate inode_ctx for
the inodes based on the output, in respective dht_xlators. So the readdirp will
populate inodes_ctx for the files (that is already being present in volume
before attaching) in cold-dht, only because it got the entries from the
cold-tier.
So when an unlink comes on such an inode, the lookup associated with the unlink
will be send as a re validate request to cold-tier only, since already a lookup
was performed on the inode, and the new lookup will succeed. So from the unlink
of dht, it will hash to cold-tier but the cached_subvol will be cold, since
there is a mismatch in hash and cach , it chose hashed subvolume and will sent
the fop to hot dht, and the fops fail with EINVAL from the hot-dht since it
does not have inode_ctx stored for that inode (because, no lookup was performed
from hot-dht).
--- Additional comment from Mohammed Rafi KC on 2015-07-10 07:22:43 EDT ---
The same problem could be there for the following FOP's too.
1) dht_link,
2) getxattr "trusted.distribute.linkinfo"
3) f/setxattr
4) f/removexattr
5) unlink of a link file
--- Additional comment from Anand Avati on 2015-07-21 08:53:46 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-06 02:48:17 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#3) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-13 07:33:06 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#5) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-13 16:04:40 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#6) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-13 16:44:45 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#7) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-14 02:35:40 EDT ---
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before
actual fop) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-08-14 10:55:00 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#8) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-19 01:43:03 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#9) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-21 06:55:57 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#10) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Anand Avati on 2015-08-21 11:51:12 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#11) for review on master by Joseph Fernandes
--- Additional comment from Anand Avati on 2015-08-27 13:14:18 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#12) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-03 05:37:12 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#13) for review on master by Joseph Fernandes
--- Additional comment from Vijay Bellur on 2015-09-04 05:09:25 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#14) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-09 05:13:55 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#16) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-12 10:23:15 EDT ---
REVIEW: http://review.gluster.org/11892 (fuse:sent at least one lookup before
actual fop) posted (#3) for review on master by Dan Lambright
(dlambrig at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-14 04:53:43 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#17) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2015-09-15 01:39:21 EDT ---
REVIEW: http://review.gluster.org/11675 (tier/dht: unlink fails after lookup in
a directory) posted (#18) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1236032
[Bug 1236032] Tiering: unlink failed with error "Invaid argument"
https://bugzilla.redhat.com/show_bug.cgi?id=1260923
[Bug 1260923] Tracker for tiering in 3.1.2
https://bugzilla.redhat.com/show_bug.cgi?id=1266880
[Bug 1266880] Tiering: unlink failed with error "Invaid argument"
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=p6zRympgs4&a=cc_unsubscribe
More information about the Bugs
mailing list