[Bugs] [Bug 1785611] New: glusterfsd cashes after a few seconds

Fri Dec 20 12:51:35 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1785611

            Bug ID: 1785611
           Summary: glusterfsd cashes after a few seconds
           Product: GlusterFS
           Version: mainline
          Hardware: armv7l
                OS: Linux
            Status: NEW
         Component: core
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: jahernan at redhat.com
                CC: bugs at gluster.org, jahernan at redhat.com,
                    robin.van.oosten at hpe.com
        Depends On: 1785323
  Target Milestone: ---
    Classification: Community

+++ This bug was initially created as a clone of Bug #1785323 +++

Description of problem:
glusterfsd cashes after a few seconds

How reproducible:
After the command "gluster volume start gv0 force" glusterfsd is started but
crashes after a few seconds.

Additional info:

OS:             Armbian 5.95 Odroidxu4 Ubuntu bionic default
Kernel:         Linux 4.14.141
Build date:     02.09.2019
Gluster:        7.0
Hardware:       node1 - node4:  Odroid HC2 + WD RED 10TB
                node5:          Odroid HC2 + Samsung SSD 850 EVO 250GB

root at hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor
preset: enabled)
   Active: active (running) since Thu 2019-12-19 13:32:41 CET; 1s ago
     Docs: man:glusterd(8)
  Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
 Main PID: 12735 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           ├─12772 /usr/sbin/glusterfsd -s hc2-1 --volfile-id
gv0.hc2-1.data-brick1-gv0 -p /var/run/gluster/vols/gv0/hc2-1-data
           └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p
/var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl

Dec 19 13:32:37 hc2-1 systemd[1]: Starting GlusterFS, a clustered file-system
server...
Dec 19 13:32:41 hc2-1 systemd[1]: Started GlusterFS, a clustered file-system
server.

root at hc2-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor
preset: enabled)
   Active: active (running) since Thu 2019-12-19 13:32:41 CET; 15s ago
     Docs: man:glusterd(8)
  Process: 12734 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, s
 Main PID: 12735 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─12735 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           └─12794 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p
/var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/gl

Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: dlfcn 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: libpthread 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: llistxattr 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: setfsid 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: spinlock 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: epoll.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: xattr.h 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: st_atim.tv_nsec 1
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: package-string: glusterfs 7.0
Dec 19 13:32:45 hc2-1 data-brick1-gv0[12772]: ---------
root at hc2-1:~# 

root at hc2-9:~# gluster volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick hc2-1:/data/brick1/gv0                N/A       N/A        N       N/A  
Brick hc2-2:/data/brick1/gv0                49152     0          Y       1322 
Brick hc2-5:/data/brick1/gv0                49152     0          Y       1767 
Brick hc2-3:/data/brick1/gv0                49152     0          Y       1474 
Brick hc2-4:/data/brick1/gv0                49152     0          Y       1472 
Brick hc2-5:/data/brick2/gv0                49153     0          Y       1787 
Self-heal Daemon on localhost               N/A       N/A        Y       1314 
Self-heal Daemon on hc2-5                   N/A       N/A        Y       1808 
Self-heal Daemon on hc2-3                   N/A       N/A        Y       1485 
Self-heal Daemon on hc2-4                   N/A       N/A        Y       1486 
Self-heal Daemon on hc2-1                   N/A       N/A        Y       13522
Self-heal Daemon on hc2-2                   N/A       N/A        Y       1348 

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

root at hc2-9:~# gluster volume heal gv0 info summary
Brick hc2-1:/data/brick1/gv0
Status: Transport endpoint is not connected
Total Number of entries: -
Number of entries in heal pending: -
Number of entries in split-brain: -
Number of entries possibly healing: -

Brick hc2-2:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-5:/data/brick1/gv0
Status: Connected
Total Number of entries: 977
Number of entries in heal pending: 977
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-3:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-4:/data/brick1/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick hc2-5:/data/brick2/gv0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

root at hc2-9:~# gluster volume info

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 9fcb6792-3899-4802-828f-84f37c026881
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: hc2-1:/data/brick1/gv0
Brick2: hc2-2:/data/brick1/gv0
Brick3: hc2-5:/data/brick1/gv0 (arbiter)
Brick4: hc2-3:/data/brick1/gv0
Brick5: hc2-4:/data/brick1/gv0
Brick6: hc2-5:/data/brick2/gv0 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet

--- Additional comment from Xavi Hernandez on 2019-12-19 19:22:54 CET ---

Currently I can't test it on an ARM machine. Is it possible for you to open the
coredump with gdb with symbols loaded and run this command to get some
information about the reason of the crash ?

(gdb) t a a bt

--- Additional comment from Robin van Oosten on 2019-12-19 20:23:47 CET ---

I can open the coredump with gdb but where do I find the symbols file?

gdb /usr/sbin/glusterfs /core
.
.
.
Reading symbols from /usr/sbin/glusterfs...(no debugging symbols found)...done.

--- Additional comment from Robin van Oosten on 2019-12-19 21:35:15 CET ---

--- Additional comment from Robin van Oosten on 2019-12-19 21:39:17 CET ---

After "apt install glusterfs-dbg" I was able to load the symbols file.

Reading symbols from /usr/sbin/glusterfsd...Reading symbols from
/usr/lib/debug/.build-id/31/453c4877ad5c7f1a2553147feb1c0816f67654.debug...done.

See attachment 1646676.

--- Additional comment from Xavi Hernandez on 2019-12-19 22:08:39 CET ---

You will also need to install debug symbols for libc because it doesn't seem
able to correctly decode the backtraces inside that library.

--- Additional comment from Robin van Oosten on 2019-12-19 23:35:19 CET ---

Installed libc6-dbg now.

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1785323
[Bug 1785323] glusterfsd cashes after a few seconds
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.