[Bugs] [Bug 1535772] New: Random GlusterFSD process dies during rebalance
bugzilla at redhat.com
bugzilla at redhat.com
Thu Jan 18 04:23:55 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1535772
Bug ID: 1535772
Summary: Random GlusterFSD process dies during rebalance
Product: GlusterFS
Version: mainline
Component: core
Severity: medium
Assignee: jthottan at redhat.com
Reporter: jthottan at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
jmackey at getcruise.com, jthottan at redhat.com
Depends On: 1533269
+++ This bug was initially created as a clone of Bug #1533269 +++
Description of problem:
During a rebalance of a 252 brick volume, as the rebalance is scanning through
the initial directories, within 5-10 minutes, a seemingly random peer brick
process dies which stops the rebalance processes. The brick logs contain
healthy connection and disconnection up until the failure, where the brick
process throws a stack trace:
pending frames:
frame : type(0) op(36)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-01-10 21:13:21
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.4
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xaa)[0x7f7000635a5a]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2e7)[0x7f700063f737]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f6fffa284b0]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x4)[0x7f6fffdc6d44]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_rename_key+0x66)[0x7f7000630866]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/features/selinux.so(+0x1f15)[0x7f6ff87a9f15]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/features/marker.so(+0x11d77)[0x7f6ff8595d77]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f70006a81d5]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f70006a81d5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/features/quota.so(+0xe02f)[0x7f6ff3de602f]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/debug/io-stats.so(+0x72d6)[0x7f6ff3bb12d6]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsetxattr+0xb5)[0x7f70006a81d5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0x2e3be)[0x7f6ff37763be]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xbd19)[0x7f6ff3753d19]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xbdb5)[0x7f6ff3753db5]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xc75c)[0x7f6ff375475c]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xbdfe)[0x7f6ff3753dfe]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xc5fe)[0x7f6ff37545fe]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xc712)[0x7f6ff3754712]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xbdde)[0x7f6ff3753dde]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0xc804)[0x7f6ff3754804]
/usr/lib/x86_64-linux-gnu/glusterfs/3.12.4/xlator/protocol/server.so(+0x276ce)[0x7f6ff376f6ce]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpcsvc_request_handler+0x96)[0x7f70003feca6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f6fffdc46ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6fffafa3dd]
---------
Anywhere from 1 to 5 brick processes on various hosts will all die at the same
time.
Version-Release number of selected component (if applicable):
3.12.4
How reproducible:
This happens consistently within the first 10 minutes of a rebalance
Steps to Reproduce:
- We had an existing gluster volume with about 2PB of data in it. Since our
existing gluster configs (3.7.20) were pretty old, we decided to bring down the
cluster and rebuild it fresh with the existing data. All gluster 3.7.20
libraries were purged, .glusterfs directory deleted from each brick and
glusterd 3.12.4 was installed. All 252 bricks were re-added to the cluster and
a fix-layout performed successfully. However, when a full rebalance is
initiated, eventually peer brick processes will crash.
Actual results:
Expected results:
Additional info:
--- Additional comment from Atin Mukherjee on 2018-01-11 03:46:53 EST ---
Jiffin, seeps to be crashing from selinux.c. Can you please check?
--- Additional comment from Jiffin on 2018-01-11 04:03:02 EST ---
Sure I will take a look
--- Additional comment from Worker Ant on 2018-01-17 23:02:01 EST ---
REVIEW: https://review.gluster.org/19220 (selinux-xlator : validate dict before
calling dict_rename_key()) posted (#1) for review on master by jiffin tony
Thottan
--- Additional comment from Jiffin on 2018-01-17 23:03:17 EST ---
>From the core it look like dict = NULL passed to fops handled by selinux xlator
which caused this error. A patch posted upstream
https://review.gluster.org/19220 for review
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1533269
[Bug 1533269] Random GlusterFSD process dies during rebalance
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=vMsoffGwMO&a=cc_unsubscribe
More information about the Bugs
mailing list