[Bugs] [Bug 1339465] New: Disperse volume fails on high load and logs show some assertion failures
bugzilla at redhat.com
bugzilla at redhat.com
Wed May 25 06:26:41 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1339465
Bug ID: 1339465
Summary: Disperse volume fails on high load and logs show some
assertion failures
Product: Red Hat Gluster Storage
Version: 3.1
Component: glusterfs
Sub Component: disperse
Keywords: Triaged
Severity: high
Assignee: rhs-bugs at redhat.com
Reporter: aspandey at redhat.com
QA Contact: byarlaga at redhat.com
CC: aspandey at redhat.com, bugs at gluster.org,
pkarampu at redhat.com, xhernandez at datalab.es
Depends On: 1331254
Blocks: 1330132, 1332845
+++ This bug was initially created as a clone of Bug #1331254 +++
+++ This bug was initially created as a clone of Bug #1330132 +++
Description of problem:
A distributed iozone test over multiple NFS mounts on different machines causes
the test to fail and some assertion failures appear on the logs:
[2016-04-21 19:29:58.096645] E [ec-inode-read.c:1157:ec_readv_rebuild]
(-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x5b)
[0x7f9e4e8f18bb]
-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_readv+0x107)
[0x7f9e4e908197]
-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_readv_rebuild+0x236)
[0x7f9e4e907f26] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->fd->inode,
&cbk->iatt[0].ia_size)
[2016-04-21 19:29:58.126547] E [ec-common.c:1641:ec_lock_unfreeze]
(-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_inodelk+0x155)
[0x7f9e4e8fc305]
-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_unlocked+0x35)
[0x7f9e4e8f3c25]
-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_lock_unfreeze+0x100)
[0x7f9e4e8f3ab0] ) 0-: Assertion failed: list_empty(&lock->waiting) &&
list_empty(&lock->owners)
[2016-04-21 19:30:05.998568] E [ec-inode-read.c:1612:ec_manager_stat]
(-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_resume+0x88)
[0x7f9e4e8f1a68]
-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x5b)
[0x7f9e4e8f18bb]
-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_stat+0x315)
[0x7f9e4e905ed5] ) 0-: Assertion failed: ec_get_inode_size(fop,
fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)
[2016-04-21 19:30:05.999146] E [MSGID: 114031]
[client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-8: remote
operation failed [Invalid argument]
[2016-04-21 19:30:05.999132] E [MSGID: 114031]
[client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-10: remote
operation failed [Invalid argument]
[2016-04-21 19:30:05.999237] E [MSGID: 114031]
[client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-11: remote
operation failed [Invalid argument]
[2016-04-21 19:30:05.999259] E [MSGID: 114031]
[client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-7: remote
operation failed [Invalid argument]
[2016-04-21 19:30:05.999326] E [MSGID: 114031]
[client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-9: remote
operation failed [Invalid argument]
[2016-04-21 19:30:06.047496] E [MSGID: 114031]
[client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-6: remote
operation failed [Invalid argument]
[2016-04-21 19:30:06.047559] W [MSGID: 122015] [ec-common.c:1675:ec_unlocked]
0-test-disperse-1: entry/inode unlocking failed (FSTAT) [Invalid argument]
Version-Release number of selected component (if applicable): mainline
How reproducible:
It happens randomly after some time running the distributed iozone test.
Steps to Reproduce:
1.
2.
3.
Actual results:
Volume access fails and iozone quits with an error.
Expected results:
iozone should complete the test successfully.
Additional info:
Probably related to a race when cancelling the lock release timeout while the
callback is already executing. In this case the new fop is not placed in the
right waiting list.
--- Additional comment from Vijay Bellur on 2016-04-29 05:19:27 EDT ---
REVIEW: http://review.gluster.org/14112 (cluster/ec: Fix issues with eager
locking) posted (#1) for review on master by Xavier Hernandez
(xhernandez at datalab.es)
--- Additional comment from Vijay Bellur on 2016-05-02 10:45:05 EDT ---
COMMIT: http://review.gluster.org/14112 committed in master by Jeff Darcy
(jdarcy at redhat.com)
------
commit 209985e861f4d8a22bfdb457c0e8d7045ab44553
Author: Xavier Hernandez <xhernandez at datalab.es>
Date: Thu Apr 28 08:42:40 2016 +0200
cluster/ec: Fix issues with eager locking
Due to a race in timer cancellation, in some cases it was possible
to unlock the lock while another concurrent fop that needed it
continues execution as if it were not released.
This patch also fixes an issue that caused a lock to not be released
if an error was found while preparing ec_update_size_version().
Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1
BUG: 1331254
Signed-off-by: Xavier Hernandez <xhernandez at datalab.es>
Reviewed-on: http://review.gluster.org/14112
Smoke: Gluster Build System <jenkins at build.gluster.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Chen Chen <chenchen at smartquerier.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1330132
[Bug 1330132] Disperse volume fails on high load and logs show some
assertion failures
https://bugzilla.redhat.com/show_bug.cgi?id=1331254
[Bug 1331254] Disperse volume fails on high load and logs show some
assertion failures
https://bugzilla.redhat.com/show_bug.cgi?id=1332845
[Bug 1332845] Disperse volume fails on high load and logs show some
assertion failures
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=fgo0HOVFq6&a=cc_unsubscribe
More information about the Bugs
mailing list