[Bugs] [Bug 1785611] New: glusterfsd cashes after a few seconds
bugzilla at redhat.com
bugzilla at redhat.com
Fri Dec 20 12:51:35 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1785611
Bug ID: 1785611
Summary: glusterfsd cashes after a few seconds
Product: GlusterFS
Version: mainline
Hardware: armv7l
OS: Linux
Status: NEW
Component: core
Severity: medium
Assignee: bugs at gluster.org
Reporter: jahernan at redhat.com
CC: bugs at gluster.org, jahernan at redhat.com,
robin.van.oosten at hpe.com
Depends On: 1785323
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1785323 +++
Description of problem:
glusterfsd cashes after a few seconds
How reproducible:
After the command "gluster volume start gv0 force" glusterfsd is started but
crashes after a few seconds.
Additional info:
OS: Armbian 5.95 Odroidxu4 Ubuntu bionic default
Kernel: Linux 4.14.141
Build date: 02.09.2019
Gluster: 7.0
Hardware: node1 - node4: Odroid HC2 + WD RED 10TB
node5: Odroid HC2 + Samsung SSD 850 EVO 250GB
root at hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor
preset: enabled)
Active: active (running) since Thu 2019-12-19 13:32:41 CET; 1s ago
Docs: man:glusterd(8)
Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
Main PID: 12735 (glusterd)
CGroup: /system.slice/glusterd.service
├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
├─12772 /usr/sbin/glusterfsd -s hc2-1 --volfile-id
gv0.hc2-1.data-brick1-gv0 -p /var/run/gluster/vols/gv0/hc2-1-data
└─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p
/var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl
Dec 19 13:32:37 hc2-1 systemd[1]: Starting GlusterFS, a clustered file-system
server...
Dec 19 13:32:41 hc2-1 systemd[1]: Started GlusterFS, a clustered file-system
server.
root at hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor
preset: enabled)
Active: active (running) since Thu 2019-12-19 13:32:41 CET; 15s ago
Docs: man:glusterd(8)
Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
Main PID: 12735 (glusterd)
CGroup: /system.slice/glusterd.service
├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
└─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p
/var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: dlfcn 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: libpthread 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: llistxattr 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: setfsid 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: spinlock 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: epoll.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: xattr.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: st_atim.tv_nsec 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: package-string: glusterfs 7.0
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: ---------
root at hc2-1:~#
root at hc2-9:~# gluster volume status
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick hc2-1:/data/brick1/gv0 N/A N/A N N/A
Brick hc2-2:/data/brick1/gv0 49152 0 Y 1322
Brick hc2-5:/data/brick1/gv0 49152 0 Y 1767
Brick hc2-3:/data/brick1/gv0 49152 0 Y 1474
Brick hc2-4:/data/brick1/gv0 49152 0 Y 1472
Brick hc2-5:/data/brick2/gv0 49153 0 Y 1787
Self-heal Daemon on localhost N/A N/A Y 1314
Self-heal Daemon on hc2-5 N/A N/A Y 1808
Self-heal Daemon on hc2-3 N/A N/A Y 1485
Self-heal Daemon on hc2-4 N/A N/A Y 1486
Self-heal Daemon on hc2-1 N/A N/A Y 13522
Self-heal Daemon on hc2-2 N/A N/A Y 1348
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
root at hc2-9:~# gluster volume heal gv0 info summary
Brick hc2-1:/data/brick1/gv0
Status: Transport endpoint is not connected
Total Number of entries: -
Number of entries in heal pending: -
Number of entries in split-brain: -
Number of entries possibly healing: -
Brick hc2-2:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick hc2-5:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick hc2-3:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick hc2-4:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
Brick hc2-5:/data/brick2/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0
root at hc2-9:~# gluster volume info
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 9fcb6792-3899-4802-828f-84f37c026881
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: hc2-1:/data/brick1/gv0
Brick2: hc2-2:/data/brick1/gv0
Brick3: hc2-5:/data/brick1/gv0 (arbiter)
Brick4: hc2-3:/data/brick1/gv0
Brick5: hc2-4:/data/brick1/gv0
Brick6: hc2-5:/data/brick2/gv0 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
--- Additional comment from Xavi Hernandez on 2019-12-19 19:22:54 CET ---
Currently I can't test it on an ARM machine. Is it possible for you to open the
coredump with gdb with symbols loaded and run this command to get some
information about the reason of the crash ?
(gdb) t a a bt
--- Additional comment from Robin van Oosten on 2019-12-19 20:23:47 CET ---
I can open the coredump with gdb but where do I find the symbols file?
gdb /usr/sbin/glusterfs /core
.
.
.
Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done.
--- Additional comment from Robin van Oosten on 2019-12-19 21:35:15 CET ---
--- Additional comment from Robin van Oosten on 2019-12-19 21:39:17 CET ---
After "apt install glusterfs-dbg" I was able to load the symbols file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from
/usr/lib/debug/.build-id/31/453c4877ad5c7f1a2553147feb1c0816f67654.debug...done.
See attachment 1646676.
--- Additional comment from Xavi Hernandez on 2019-12-19 22:08:39 CET ---
You will also need to install debug symbols for libc because it doesn't seem
able to correctly decode the backtraces inside that library.
--- Additional comment from Robin van Oosten on 2019-12-19 23:35:19 CET ---
Installed libc6-dbg now.
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1785323
[Bug 1785323] glusterfsd cashes after a few seconds
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list