[Bugs] [Bug 1602866] dht: Crash seen in thread dht_dir_attr_heal
bugzilla at redhat.com
bugzilla at redhat.com
Wed Jul 18 16:38:14 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1602866
Nithya Balachandran <nbalacha at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Comment #0 is|1 |0
private| |
Assignee|bugs at gluster.org |nbalacha at redhat.com
--- Comment #1 from Nithya Balachandran <nbalacha at redhat.com> ---
RCA:
The process crashed because both loc->gfid and loc->inode->gfid are null. The
dht_dir_attr_heal was triggered in dht_lookup_dir_cbk (). This is a fresh named
lookup so loc->gfid is null and loc->inode->gfid is null because loc->inode has
not been linked.
(gdb) bt
#0 0x00007f260e71b765 in raise () from ./lib64/libc.so.6
#1 0x00007f260e71d36a in abort () from ./lib64/libc.so.6
#2 0x00007f260e713f97 in __assert_fail_base () from ./lib64/libc.so.6
#3 0x00007f260e714042 in __assert_fail () from ./lib64/libc.so.6
#4 0x00007f2602149ec2 in client_pre_inodelk (this=0x7f25fc00ef20,
req=0x7f25e4217670, loc=0x7f25e400a298, cmd=6, flock=0x7f25e400a4b8,
volume=0x7f25fc0132b0 "slave-replicate-1", xdata=0x0) at
client-common.c:841
#5 0x00007f2602138b24 in client3_3_inodelk (frame=0x7f25e4015290,
this=0x7f25fc00ef20, data=0x7f25e4217760) at client-rpc-fops.c:5307
#6 0x00007f260210d9d9 in client_inodelk (frame=0x7f25e4015290,
this=0x7f25fc00ef20, volume=0x7f25fc0132b0 "slave-replicate-1",
loc=0x7f25e400a298, cmd=6, lock=0x7f25e400a4b8, xdata=0x0) at client.c:1679
#7 0x00007f2601ea4444 in afr_nonblocking_inodelk (frame=0x7f25e400f680,
this=0x7f25fc015230) at afr-lk-common.c:1093
#8 0x00007f2601e9d149 in afr_lock (frame=0x7f25e400f680, this=0x7f25fc015230)
at afr-transaction.c:1652
#9 0x00007f2601e9eb84 in afr_transaction_start (local=0x7f25e4009e60,
this=0x7f25fc015230) at afr-transaction.c:2333
#10 0x00007f2601e9eec0 in afr_transaction (frame=0x7f25e400f680,
this=0x7f25fc015230, type=AFR_METADATA_TRANSACTION) at afr-transaction.c:2402
#11 0x00007f2601e875d7 in afr_setattr (frame=0x7f25e400ece0,
this=0x7f25fc015230, loc=0x7f25e4008e58, buf=0x7f25e4008f58, valid=7,
xdata=0x0)
at afr-inode-write.c:895
#12 0x00007f261011681d in syncop_setattr (subvol=0x7f25fc015230,
loc=0x7f25e4008e58, iatt=0x7f25e4008f58, valid=7, preop=0x0, postop=0x0,
xdata_in=0x0, xdata_out=0x0) at syncop.c:1811
#13 0x00007f2601bc0448 in dht_dir_attr_heal (data=0x7f25e4007c60) at
dht-selfheal.c:2497
#14 0x00007f261010f894 in synctask_wrap () at syncop.c:375
#15 0x00007f260e72fb60 in ?? () from ./lib64/libc.so.6
#16 0x0000000000000000 in ?? ()
(gdb) f 13
#13 0x00007f2601bc0448 in dht_dir_attr_heal (data=0x7f25e4007c60) at
dht-selfheal.c:2497
2497 ret = syncop_setattr (subvol, &local->loc,
&local->mds_stbuf,
(gdb) p local->loc
$119 = {path = 0x7f25e4016d50 "/.gfid/00000000-0000-0000-0000-", '0' <repeats
11 times>, "1/rsnapshot_symlinkbug",
name = 0x7f25e4016d7c "rsnapshot_symlinkbug", inode = 0x7f25ec030d30, parent
= 0x7f25fc078870, gfid = '\000' <repeats 15 times>,
pargfid = '\000' <repeats 15 times>, "\001"}
(gdb) p local->loc->gfid
$120 = '\000' <repeats 15 times>
(gdb) p *local->loc->inode
$121 = {table = 0x7f25fc078770, gfid = '\000' <repeats 15 times>, lock =
{spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0,
__nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev
= 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,
__align = 0}}, nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 3,
ia_type = IA_INVAL, fd_list = {next = 0x7f25ec030d88,
prev = 0x7f25ec030d88}, dentry_list = {next = 0x7f25ec030d98, prev =
0x7f25ec030d98}, hash = {next = 0x7f25ec030da8, prev = 0x7f25ec030da8},
list = {next = 0x7f25ec03aa98, prev = 0x7f25fc0787d0}, _ctx = 0x7f25ec032580}
In dht_lookup_dir_cbk, the function unwinds immediately after calling
dht_dir_attr_heal.
ret = synctask_new (this->ctx->env,
dht_dir_attr_heal,
dht_dir_attr_heal_done,
copy, copy);
if (ret) {
gf_msg (this->name, GF_LOG_ERROR,
ENOMEM,
DHT_MSG_DIR_ATTR_HEAL_FAILED,
"Synctask creation failed to
heal attr "
"for path %s gfid %s ",
local->loc.path, local->gfid);
DHT_STACK_DESTROY (copy);
}
}
}
skip_attr_heal:
DHT_STRIP_PHASE1_FLAGS (&local->stbuf);
dht_set_fixed_dir_stat (&local->postparent);
/* Delete mds xattr at the time of STACK UNWIND */
if (local->xattr)
GF_REMOVE_INTERNAL_XATTR (conf->mds_xattr_key,
local->xattr);
DHT_STACK_UNWIND (lookup, frame, local->op_ret,
local->op_errno,
local->inode, &local->stbuf, local->xattr,
&local->postparent);
}
Most of the time, the unwind reaches the fuse layer and the inode is linked
before the dht_attr_dir_heal call stack hits client_pre_inodelk. Once the inode
is linked, loc->inode->gfid is non null and client_pre_inodelk proceeds
successfully. If the inode has not been linked at this point, it asserts.
I was able to hit this by:
1. setting local->need_attrheal to 1
2. putting a sleep(10) before the line DHT_STRIP_PHASE1_FLAGS (&local->stbuf)
3. performing a fresh lookup on an existing directory
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list