[Bugs] [Bug 1632503] New: FUSE client segfaults when performance.md-cache-statfs is enabled for a volume

Mon Sep 24 22:38:00 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1632503

            Bug ID: 1632503
           Summary: FUSE client segfaults when performance.md-cache-statfs
                    is enabled for a volume
           Product: GlusterFS
           Version: 4.1
         Component: fuse
          Assignee: bugs at gluster.org
          Reporter: smuth4 at gmail.com
                CC: bugs at gluster.org

Description of problem:

I recently tried to enable performance.md-cache-statfs for some testing, but
every time I subject the FUSE mount to directory scans, the client ends up
segfaulting.

Version-Release number of selected component (if applicable):
4.1.5-ubuntu1~xenial1 from the PPA for the client
4.1.5-ubuntu1~bionic1 from the PPA for the server
I was also able to reproduce this on a manual build of the client from the git
master branch.

How reproducible:
I can consistently reproduce it with the steps below, although the time it
takes to trigger is variable (e.g. it might happen in the middle of the 1st
scan, or the 8th). I have not encountered any segfaults outside of when
performance.md-cache-statfs is enabled.

Steps to Reproduce:
1. Enable performance.md-cache-statfs on a volume
`gluster volume set tank performance.md-cache-statfs on`
2. On the client, run the following command to put a little stress on the cache
(there are about 8k files in various directories in /mnt/tank)
`for i in $(seq 1 10); do find /mnt/tank >/dev/null; done`

Actual results:
The client segfaults with the following info logged:
```
pending frames:
frame : type(1) op(STAT)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-09-24 21:02:40
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.5
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x2038a)[0x7fc54fb7538a]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2e7)[0x7fc54fb7f0d7]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7fc54ef694b0]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_put+0x3e)[0x7fc54fb9e8ee]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/mount/fuse.so(+0x146aa)[0x7fc54d6106aa]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/debug/io-stats.so(+0x19071)[0x7fc548348071]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_statfs_cbk+0x13c)[0x7fc54fbf8c2c]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/performance/md-cache.so(+0x1471e)[0x7fc54878371e]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_statfs_resume+0x1e5)[0x7fc54fc160e5]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x75)[0x7fc54fb9a635]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/performance/io-threads.so(+0x5588)[0x7fc548565588]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fc54f3056ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc54f03b41d]
```
After the segfault, there are a cluster of `Transport endpoint is not
connected` errors while the find commands continue running.

Expected results:
The command succeeds without error.

Additional info:
GDB stack trace, if that helps:
```
Thread 8 "glusteriotwr0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e3d700 (LWP 13880)]
0x00007ffff7b1d8ee in mem_put (ptr=0x7fffe43c2130) at mem-pool.c:870
870     mem-pool.c: No such file or directory.
(gdb) backtrace
#0  0x00007ffff7b1d8ee in mem_put (ptr=0x7fffe43c2130) at mem-pool.c:870
#1  0x00007ffff558f6aa in FRAME_DESTROY (frame=0x7fffe4415438) at
../../../../libglusterfs/src/stack.h:178
#2  STACK_DESTROY (stack=0x7fffe00079b8) at
../../../../libglusterfs/src/stack.h:198
#3  fuse_statfs_cbk (frame=<optimized out>, cookie=<optimized out>,
this=<optimized out>, op_ret=<optimized out>, op_errno=0, buf=<optimized out>,
xdata=0x0)
    at fuse-bridge.c:3253
#4  0x00007ffff02c7071 in ?? () from
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/debug/io-stats.so
#5  0x00007ffff7b77c2c in default_statfs_cbk (frame=0x7fffe0008518,
cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0,
buf=0x7fffec030d40, xdata=0x0)
    at defaults.c:1607
#6  0x00007ffff070271e in mdc_statfs (frame=frame at entry=0x7fffe4415438,
this=<optimized out>, loc=loc at entry=0x7fffe0009488, xdata=xdata at entry=0x0) at
md-cache.c:1084
#7  0x00007ffff7b950e5 in default_statfs_resume (frame=0x7fffe0008518,
this=0x7fffec017920, loc=0x7fffe0009488, xdata=0x0) at defaults.c:2273
#8  0x00007ffff7b19635 in call_resume (stub=0x7fffe0009438) at call-stub.c:2689
#9  0x00007ffff04e4588 in iot_worker (data=0x7fffec02d5c0) at io-threads.c:231
#10 0x00007ffff72846ba in start_thread (arg=0x7ffff7e3d700) at
pthread_create.c:333
#11 0x00007ffff6fba41d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```

Volume info
```
Volume Name: tank
Type: Distribute
Volume ID: f801b0c4-c1c4-4d28-9ff0-3a2ba2eb1919
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: g1:/data/gluster/tank/brick-89f393fe/brick
Options Reconfigured:
performance.md-cache-statfs: on
nfs.disable: on
transport.address-family: inet
```
where /data/gluster/tank/brick-89f393fe is a ZFS mount.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.