[Bugs] [Bug 1805057] New: [EC] shd crashed while heal failed due to out of memory error.
bugzilla at redhat.com
bugzilla at redhat.com
Thu Feb 20 07:44:33 UTC 2020
https://bugzilla.redhat.com/show_bug.cgi?id=1805057
Bug ID: 1805057
Summary: [EC] shd crashed while heal failed due to out of
memory error.
Product: GlusterFS
Version: 5
Status: NEW
Component: disperse
Severity: medium
Priority: medium
Assignee: bugs at gluster.org
Reporter: pkarampu at redhat.com
CC: aspandey at redhat.com, bugs at gluster.org
Depends On: 1729085
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1729085 +++
Description of problem:
The main trigger point of this crash is NO memory available for synctasks -
[2019-07-03 15:13:13.801297] A [MSGID: 0] [mem-pool.c:145:__gf_calloc] : no
memory available for size (2097224) current memory usage in kilobytes 5515680
[call stack follows]
As the backtrace suggests ec_heal_throttle tries to launch heal and failed
because it could not create new synctask.
ec_launch_heal calls ec_heal_fail which is sending NULL as an argument which is
being dereferenced.
ec_launch_heal(ec_t *ec, ec_fop_data_t *fop)
{
int ret = 0;
ret = synctask_new(ec->xl->ctx->env, ec_synctask_heal_wrap, ec_heal_done,
NULL, fop);
if (ret < 0) {
ec_fop_set_error(fop, ENOMEM);
ec_heal_fail(ec, fop);
}
}
ec_heal_fail is calling ec_getxattr_heal_cbk with op_errno=12 which is ENOMEM
#0 ec_getxattr_heal_cbk (frame=0x7f796de7dd38, cookie=0x0, xl=0x7f6f215e5800,
op_ret=-1, op_errno=12, mask=0, good=0, bad=0, xdata=0x0) at
ec-inode-read.c:399
second argument is NULL which is being dereference
399 fop_getxattr_cbk_t func = fop->data;
So, while the reason for out of memory could be related to the way shd-mux is
working, we need to fix this code in EC so that we should never dereference
NULL pointer over here.
--- Additional comment from Worker Ant on 2019-07-15 13:06:58 UTC ---
REVIEW: https://review.gluster.org/23050 (cluster/ec: Change handling of heal
failure to avoide crash) posted (#1) for review on master by Ashish Pandey
--- Additional comment from Worker Ant on 2019-11-04 11:01:35 UTC ---
REVIEW: https://review.gluster.org/23050 (cluster/ec: Change handling of heal
failure to avoid crash) merged (#10) on master by Xavi Hernandez
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1729085
[Bug 1729085] [EC] shd crashed while heal failed due to out of memory error.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list