[Bugs] [Bug 1432043] New: glusterfsd segfault in trash_truncate_stat_cbk

bugzilla at redhat.com bugzilla at redhat.com
Tue Mar 14 12:11:03 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1432043

            Bug ID: 1432043
           Summary: glusterfsd segfault in trash_truncate_stat_cbk
           Product: GlusterFS
           Version: mainline
         Component: trash-xlator
          Keywords: Triaged
          Assignee: bugs at gluster.org
          Reporter: ndevos at redhat.com
                CC: anoopcs at redhat.com, bugs at gluster.org,
                    jdarcy at redhat.com, jthottan at redhat.com, rhb1 at gcth.net
            Blocks: 1430360



+++ This bug was initially created as a clone of Bug #1430360 +++
+++                                                           +++
+++ Use this bug to get a fix in the master branch before     +++
+++ backporting it to the maintained versions.                +++

Description of problem:

I'm experiencing random segmentation faults of glusterfsd process. The problem
started appearing since the trash has been enabled.

Version-Release number of selected component (if applicable):

Debian Jessie, the packages are up to date:

ii  glusterfs-client                3.8.9-1                     amd64       
clustered file-system (client package)
ii  glusterfs-common                3.8.9-1                     amd64       
GlusterFS common libraries and translator modules
ii  glusterfs-dbg                   3.8.9-1                     amd64       
GlusterFS debugging symbols
ii  glusterfs-server                3.8.9-1                     amd64       
clustered file-system (server package)


How reproducible:

I've assembled the following cluster:

Volume Name: volume1
Type: Distributed-Replicate
Volume ID: c4c0e0a4-e705-472d-8d27-485619cc66db
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.10.9.7:/export/data1
Brick2: 10.10.9.8:/export/data1
Brick3: 10.10.9.9:/export/data1
Brick4: 10.10.9.10:/export/data1
Brick5: 10.10.9.5:/export/data1
Brick6: 10.10.9.6:/export/data1
Options Reconfigured:
features.trash: on
features.trash-eliminate-path: _REMOVED,_db_backup,*/private
performance.readdir-ahead: on
cluster.self-heal-daemon: enable
server.allow-insecure: on
performance.read-ahead: on
cluster.min-free-disk: 5
performance.stat-prefetch: on
performance.quick-read: on
auth.allow: 10.*.*.*
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: ERROR
nfs.disable: on
features.trash-max-filesize: 100MB
performance.cache-size: 1GB
cluster.favorite-child-policy: mtime
cluster.server-quorum-ratio: 51%


Actual results:

Brick 10.10.9.10:/export/data1              N/A       N/A        N       N/A  
Status of volume: volume1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.9.7:/export/data1               49154     0          Y       25300
Brick 10.10.9.8:/export/data1               49154     0          Y       18486
Brick 10.10.9.9:/export/data1               49156     0          Y       32131
Brick 10.10.9.10:/export/data1              N/A       N/A        N       N/A  
Brick 10.10.9.5:/export/data1               49154     0          Y       8549 
Brick 10.10.9.6:/export/data1               49154     0          Y       18783
Self-heal Daemon on localhost               N/A       N/A        Y       25574
Self-heal Daemon on 10.10.9.9               N/A       N/A        Y       23093
Self-heal Daemon on 10.10.9.6               N/A       N/A        Y       10453
Self-heal Daemon on 10.10.9.8               N/A       N/A        Y       21167
Self-heal Daemon on 10.10.9.7               N/A       N/A        Y       25331
Self-heal Daemon on 10.10.9.5               N/A       N/A        Y       26096

Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks

Notice the unavailability of 10.10.9.10


Expected results:

glusterfsd not crashing.


Additional info:

The core dump shows:

Core was generated by `/usr/sbin/glusterfsd -s 10.10.9.10 --volfile-id
volume1.10.10.9.10.export-data1 -p'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
106     ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x00007fd15b5aca63 in gf_strdup (src=<optimized out>) at
../../../../libglusterfs/src/mem-pool.h:185
#2  trash_truncate_stat_cbk (frame=0x7fd163965058, cookie=0x0,
this=0x7fd15c0097a0, op_ret=0, op_errno=op_errno at entry=0, buf=0x7fd14180e930,
xdata=0x7fd163113e14) at trash.c:1630
#3  0x00007fd15bdceac6 in posix_stat (frame=0x7fd163962d90, this=<optimized
out>, loc=<optimized out>, xdata=<optimized out>) at posix.c:310
#4  0x00007fd15b5ad943 in trash_truncate (frame=0x7fd163965058,
this=0x7fd15c0097a0, loc=0x7fd1631d7710, offset=140537295678864, xdata=0x0) at
trash.c:1780
#5  0x00007fd15b384042 in ctr_truncate (frame=0x7fd16395ac60,
this=0x7fd15c00b100, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at
changetimerecorder.c:731
#6  0x00007fd15ac8be24 in changelog_truncate (frame=0x7fd163966438,
this=0x7fd15c00de60, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at
changelog.c:1753
#7  0x00007fd15aa6dbff in br_stub_truncate_resume (frame=0x7fd163955ce0,
this=0x7fd15c00f680, loc=0x7fd1631d7710, offset=0, xdata=0x7fd1631327e0) at
bit-rot-stub.c:2051
#8  0x00007fd165e99a2d in call_resume (stub=0x7fd1631d76c0) at call-stub.c:2508
#9  0x00007fd15aa71036 in br_stub_fd_incversioning_cbk (frame=0x7fd163955ce0,
cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=<optimized
out>, xdata=<optimized out>) at bit-rot-stub.c:613
#10 0x00007fd15ac86ff4 in changelog_fsetxattr_cbk (frame=0x7fd163966efc,
cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0)
at changelog.c:1538
#11 0x00007fd15b38873c in ctr_fsetxattr_cbk (frame=0x7fd163960920,
cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, xdata=0x0)
at changetimerecorder.c:1294
#12 0x00007fd15bdd8813 in posix_fsetxattr (frame=frame at entry=0x7fd16396650c,
this=this at entry=0x7fd15c006d00, fd=fd at entry=0x7fd158018484,
dict=dict at entry=0x7fd163182dd8, flags=flags at entry=0,
xdata=xdata at entry=0x7fd163104ce0) at posix.c:5036
#13 0x00007fd165eec3eb in default_fsetxattr (frame=0x7fd16396650c,
this=<optimized out>, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0,
xdata=0x7fd163104ce0) at defaults.c:2328
#14 0x00007fd15b381f1d in ctr_fsetxattr (frame=0x7fd163960920,
this=0x7fd15c00b100, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0,
xdata=0x7fd163104ce0) at changetimerecorder.c:1325
#15 0x00007fd15ac891d0 in changelog_fsetxattr (frame=0x7fd163966efc,
this=0x7fd15c00de60, fd=0x7fd158018484, dict=0x7fd163182dd8, flags=0,
xdata=0x7fd163104ce0) at changelog.c:1571
#16 0x00007fd15aa7533b in br_stub_fd_versioning (this=0x7fd15c00f680,
frame=0x7fd163955ce0, stub=0x7fd163966efc, dict=0x0, dict at entry=0x7fd163182dd8,
fd=0x0, fd at entry=0x7fd158018484, callback=0x7fd15c00f680, memversion=6,
versioningtype=2, durable=0)
    at bit-rot-stub.c:682
#17 0x00007fd15aa75508 in br_stub_perform_incversioning (this=0x7fd15c00f680,
frame=0x7fd163955ce0, stub=0x7fd1631d76c0, fd=0x7fd158018484, ctx=<optimized
out>) at bit-rot-stub.c:723
#18 0x00007fd15aa77284 in br_stub_truncate (frame=0x7fd163955ce0,
this=0x7fd15c00f680, loc=0x7fd15c086610, offset=0, xdata=0x7fd1631327e0) at
bit-rot-stub.c:2131
#19 0x00007fd15a85bfc8 in posix_acl_truncate (frame=0x7fd1639536c8,
this=0x7fd15c010cf0, loc=0x7fd15c086610, off=0, xdata=0x7fd1631327e0) at
posix-acl.c:1080
#20 0x00007fd15a641406 in truncate_stat_cbk (frame=0x7fd16395dfb8, cookie=0x0,
this=0x7fd15c012260, op_ret=1543578208, op_errno=1670723272,
buf=0x7fd14180e930, buf at entry=0x7fd14180fa50, xdata=0x0) at posix.c:795
#21 0x00007fd15bdceac6 in posix_stat (frame=frame at entry=0x7fd16396335c,
this=this at entry=0x7fd15c006d00, loc=loc at entry=0x7fd163221068,
xdata=xdata at entry=0x0) at posix.c:310
#22 0x00007fd165eed427 in default_stat (frame=frame at entry=0x7fd16396335c,
this=this at entry=0x7fd15c0097a0, loc=loc at entry=0x7fd163221068,
xdata=xdata at entry=0x0) at defaults.c:2647
#23 0x00007fd165eed427 in default_stat (frame=frame at entry=0x7fd16396335c,
this=this at entry=0x7fd15c00b100, loc=loc at entry=0x7fd163221068,
xdata=xdata at entry=0x0) at defaults.c:2647
#24 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized
out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647
#25 0x00007fd15aa721c7 in br_stub_stat (frame=0x7fd16396335c,
this=0x7fd15c00de60, loc=0x7fd163221068, xdata=0x0) at bit-rot-stub.c:2818
#26 0x00007fd165eed427 in default_stat (frame=0x7fd16396335c, this=<optimized
out>, loc=0x7fd163221068, xdata=0x0) at defaults.c:2647
#27 0x00007fd15a63a0f5 in pl_truncate (frame=0x7fd16395dfb8,
this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd15c012260) at
posix.c:855
#28 0x00007fd15a42de57 in worm_truncate (frame=0x7fd16395dfb8,
this=0x7fd15c012260, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at
worm.c:185
#29 0x00007fd15a2206ec in ro_truncate (frame=0x7fd16395dfb8,
this=0x7fd15c013680, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at
read-only-common.c:175
#30 0x00007fd15a00fbca in leases_truncate (frame=0x7fd163958b40,
this=0x7fd15c0162b0, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at
leases.c:312
#31 0x00007fd159dfc2ce in up_truncate (frame=0x7fd163955640,
this=0x7fd15c0177e0, loc=0x7fd163221068, offset=0, xdata=0x0) at upcall.c:301
#32 0x00007fd165f04d81 in default_truncate_resume (frame=0x7fd163967178,
this=0x7fd15c018d90, loc=0x7fd163221068, offset=0, xdata=0x7fd1631327e0) at
defaults.c:1944
#33 0x00007fd165e99a2d in call_resume (stub=0x7fd163221018) at call-stub.c:2508
#34 0x00007fd159bee917 in iot_worker (data=0x7fd15c067480) at io-threads.c:220
#35 0x00007fd1650f6064 in start_thread (arg=0x7fd141810700) at
pthread_create.c:309
#36 0x00007fd164a2f62d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

--- Additional comment from Jeff Darcy on 2017-03-08 14:21:08 CET ---

The immediate problem seems to be that trash_truncate_stat_cbk assumes
local->loc.path will be non-NULL, but that's not entirely guaranteed to be the
case.  In general, a loc_t can be used to resolve an inode in several ways,
only some (and the less preferred ones at that) involving the path/name fields.
 Adding a NULL check should help, but it might also be interesting to find out
why we're being called this way in case there are other implications of
something the code clearly does not expect.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1430360
[Bug 1430360] glusterfsd segfault in trash_truncate_stat_cbk
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list