[Bugs] [Bug 1651246] Failed to dispatch handler

bugzilla at redhat.com bugzilla at redhat.com
Tue Jan 29 15:08:37 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1651246

David E. Smith <desmith at wustl.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |desmith at wustl.edu



--- Comment #15 from David E. Smith <desmith at wustl.edu> ---
I'm having what appears to be the same issue. Started when I upgraded from 3.12
to 5.2 a few weeks back, and the subsequent upgrade to 5.3 did not resolve the
problem.

My servers (two, in a 'replica 2' setup) publish two volumes. One is Web site
content, about 110GB; the other is Web config files, only a few megabytes.
(Wasn't worth building extra servers for that second volume.) FUSE clients have
been crashing on the larger volume every three or four days.

The client's logs show many hundreds of instances of this (I don't know if it's
related):
[2019-01-29 08:14:16.542674] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384)
[0x7fa171ead384]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e)
[0x7fa1720bee3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7fa1809cc2ad] )
0-dict: dict is NULL [Invalid argument]

Then, when the client's glusterfs process crashes, this is logged:

The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker]
0-epoll: Failed to dispatch handler" repeated 871 times between [2019-01-29
08:12:48.390535] and [2019-01-29 08:14:17.100279]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-01-29 08:14:17
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/lib64/libglusterfs.so.0(+0x26610)[0x7fa1809d8610]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fa1809e2b84]
/lib64/libc.so.6(+0x36280)[0x7fa17f03c280]
/lib64/libglusterfs.so.0(+0x3586d)[0x7fa1809e786d]
/lib64/libglusterfs.so.0(+0x370a2)[0x7fa1809e90a2]
/lib64/libglusterfs.so.0(inode_forget_with_unref+0x46)[0x7fa1809e9f96]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x85bd)[0x7fa177dae5bd]
/usr/lib64/glusterfs/5.3/xlator/mount/fuse.so(+0x1fd7a)[0x7fa177dc5d7a]
/lib64/libpthread.so.0(+0x7dd5)[0x7fa17f83bdd5]
/lib64/libc.so.6(clone+0x6d)[0x7fa17f103ead]
---------



Info on the volumes themselves, gathered from one of my servers:

[davidsmith at wuit-s-10889 ~]$ sudo gluster volume info all

Volume Name: web-config
Type: Replicate
Volume ID: 6c5dce6e-e64e-4a6d-82b3-f526744b463d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-config
Brick2: 172.23.128.27:/data/web-config
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
cluster.min-free-disk: 1
cluster.quorum-count: 2
cluster.quorum-type: fixed
network.ping-timeout: 10
auth.allow: *
performance.readdir-ahead: on

Volume Name: web-content
Type: Replicate
Volume ID: fcabc15f-0cec-498f-93c4-2d75ad915730
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.23.128.26:/data/web-content
Brick2: 172.23.128.27:/data/web-content
Options Reconfigured:
network.ping-timeout: 10
cluster.quorum-type: fixed
cluster.quorum-count: 2
performance.readdir-ahead: on
auth.allow: *
cluster.min-free-disk: 1
client.event-threads: 4
server.event-threads: 4
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
performance.cache-size: 4GB



gluster> volume status all detail
Status of volume: web-config
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5612
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-config
TCP Port             : 49152
RDMA Port            : 0
Online               : Y
Pid                  : 5540
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277

Status of volume: web-content
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.26:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5649
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962279
------------------------------------------------------------------------------
Brick                : Brick 172.23.128.27:/data/web-content
TCP Port             : 49153
RDMA Port            : 0
Online               : Y
Pid                  : 5567
File System          : ext3
Device               : /dev/sdb1
Mount Options        : rw,seclabel,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 135.9GB
Total Disk Space     : 246.0GB
Inode Count          : 16384000
Free Inodes          : 14962277


I have a couple of core files that appear to be from this, but I'm not much of
a developer (haven't touched C in fifteen years) so I don't know what to do
with them that would be of value in this case.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list