[Bugs] [Bug 1671556] New: glusterfs FUSE client crashing every few days with 'Failed to dispatch handler'

Thu Jan 31 21:59:49 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1671556

            Bug ID: 1671556
           Summary: glusterfs FUSE client crashing every few days with
                    'Failed to dispatch handler'
           Product: GlusterFS
           Version: 5
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: fuse
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: desmith at wustl.edu
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

This is a re-post of my FUSE crash report from BZ1651246. That issue is for a
crash in the FUSE client. Mine is too, but I was asked in that bug to open a
new issue, so here you go. :)

My servers (two, in a 'replica 2' setup) publish two volumes. One is Web site
content, about 110GB; the other is Web config files, only a few megabytes.
(Wasn't worth building extra servers for that second volume.) FUSE clients have
been crashing on the larger volume every three or four days. I can't reproduce
this on-demand, unfortunately, but I've got several cores from previous crashes
that may be of value to you.

I'm using Gluster 5.3 from the RPMs provided by the CentOS Storage SIG, on a
Red Hat Enterprise Linux 7.x system.

The client's logs show many hundreds of instances of this (I don't know if it's
related):
[2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384)
[0x7fa171ead384]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e)
[0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad] )
0-dict: dict is NULL [Invalid argument]

Then, when the client's glusterfs process crashes, this is logged:

The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker]
0-epoll: Failed to dispatch handler" repeated 871 times between [2019-01-29
08:12:48.390535] and [2019-01-29 08:14:17.100279]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-01-29 08:14:17
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84]
/lib64/libc.so.6(+0x36280)[0x7fa17f03c280]
/lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d]
/lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2]
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a]
/lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5]
/lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead]
---------

Info on the volumes themselves, gathered from one of my servers:

[davidsmith at wuit-s-10889 ~]$ sudo gluster volume info all

Volume Name: web-config
Type: Replicate
Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-config
Brick2: 172.23.128.27:/data/web-config
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
cluster.min-free-disk: 1
cluster.quorum-count: 2
cluster.quorum-type: fixed
network.ping-timeout: 10
auth.allow: *
performance.readdir-ahead: on

Volume Name: web-content
Type: Replicate
Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-content
Brick2: 172.23.128.27:/data/web-content
Options Reconfigured:
network.ping-timeout: 10
cluster.quorum-type: fixed
cluster.quorum-count: 2
performance.readdir-ahead: on
auth.allow: *
cluster.min-free-disk: 1
client.event-threads: 4
server.event-threads: 4
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
performance.cache-size: 4GB

gluster> volume status all detail
Status of volume: web-config
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5612
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5540
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277

Status of volume: web-content
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5649
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5567
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277

I'll attach a couple of the core files generated by the crashing glusterfs
instances, size limits permitting (they range from 3 to 8 GB). If I can't
attach them, I'll find somewhere to host them.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.