[Bugs] [Bug 1705351] New: glusterfsd crash after days of running
bugzilla at redhat.com
bugzilla at redhat.com
Thu May 2 07:10:11 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1705351
Bug ID: 1705351
Summary: glusterfsd crash after days of running
Product: GlusterFS
Version: mainline
Hardware: x86_64
OS: Linux
Status: NEW
Component: HDFS
Severity: urgent
Assignee: bugs at gluster.org
Reporter: waza123 at inbox.lv
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
One of brick just crashed glusterfsd and it cant be started again
What can I do to start it again ?
crash dump gdb:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 up_lk (frame=0x7fea88193f30, this=0x7feb3401c770, fd=0x0, cmd=6,
flock=0x7feb0d174d40, xdata=0x0) at upcall.c:239
239 local = upcall_local_init (frame, this, NULL, NULL, fd->inode, NULL);
[Current thread is 1 (Thread 0x7feb0031e700 (LWP 12319))]
(gdb) bt
#0 up_lk (frame=0x7fea88193f30, this=0x7feb3401c770, fd=0x0, cmd=6,
flock=0x7feb0d174d40, xdata=0x0) at upcall.c:239
#1 0x00007feb3e1cf65d in default_lk_resume (frame=0x7feb0d174ae0,
this=0x7feb3401e060, fd=0x0, cmd=6, lock=0x7feb0d174d40, xdata=0x0) at
defaults.c:1833
#2 0x00007feb3e166f35 in call_resume (stub=0x7feb0d174bf0) at call-stub.c:2508
#3 0x00007feb31e00d74 in iot_worker (data=0x7feb34058480) at io-threads.c:222
#4 0x00007feb3d8ca6ba in start_thread (arg=0x7feb0031e700) at
pthread_create.c:333
#5 0x00007feb3d60041d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) bt full
#0 up_lk (frame=0x7fea88193f30, this=0x7feb3401c770, fd=0x0, cmd=6,
flock=0x7feb0d174d40, xdata=0x0) at upcall.c:239
op_errno = -1
local = 0x0
__FUNCTION__ = "up_lk"
#1 0x00007feb3e1cf65d in default_lk_resume (frame=0x7feb0d174ae0,
this=0x7feb3401e060, fd=0x0, cmd=6, lock=0x7feb0d174d40, xdata=0x0) at
defaults.c:1833
_new = 0x7fea88193f30
old_THIS = 0x7feb3401e060
tmp_cbk = 0x7feb3e1bafa0 <default_lk_cbk>
__FUNCTION__ = "default_lk_resume"
#2 0x00007feb3e166f35 in call_resume (stub=0x7feb0d174bf0) at call-stub.c:2508
old_THIS = 0x7feb3401e060
__FUNCTION__ = "call_resume"
#3 0x00007feb31e00d74 in iot_worker (data=0x7feb34058480) at io-threads.c:222
conf = 0x7feb34058480
this = <optimized out>
stub = 0x7feb0d174bf0
sleep_till = {tv_sec = 1556637893, tv_nsec = 0}
ret = <optimized out>
pri = 1
bye = _gf_false
__FUNCTION__ = "iot_worker"
#4 0x00007feb3d8ca6ba in start_thread (arg=0x7feb0031e700) at
pthread_create.c:333
__res = <optimized out>
pd = 0x7feb0031e700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140647297312512,
5756482990956014801, 0, 140648089937359, 140647297313216, 140648166818944,
-5749651260269466415,
-5749590536105693999}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0,
0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#5 0x00007feb3d60041d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
(gdb)
# config
# gluster volume info
Volume Name: hadoop_volume
Type: Disperse
Volume ID: f13b43b0-ff9e-429b-81ed-15c92cdd1181
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: hdd1:/hadoop
Brick2: hdd2:/hadoop
Brick3: hdd3:/hadoop
Options Reconfigured:
cluster.disperse-self-heal-daemon: enable
server.statedump-path: /tmp
performance.client-io-threads: on
server.event-threads: 16
client.event-threads: 16
cluster.lookup-optimize: on
performance.parallel-readdir: on
transport.address-family: inet
nfs.disable: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 500000
features.lock-heal: on
# status
# gluster volume status
Status of volume: hadoop_volume
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick hdd1:/hadoop 49152 0 Y 5085
Brick hdd2:/hadoop 49152 0 Y 4044
Self-heal Daemon on localhost N/A N/A Y 2383
Self-heal Daemon on serv3 N/A N/A Y 2423
Self-heal Daemon on serv2 N/A N/A Y 3429
Self-heal Daemon on hdd2 N/A N/A Y 4035
Self-heal Daemon on hdd1 N/A N/A Y 5076
Task Status of Volume hadoop_volume
------------------------------------------------------------------------------
There are no active volume tasks
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list