[Bugs] [Bug 1364026] New: glfs_fini() crashes with SIGSEGV
bugzilla at redhat.com
bugzilla at redhat.com
Thu Aug 4 10:15:25 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1364026
Bug ID: 1364026
Summary: glfs_fini() crashes with SIGSEGV
Product: GlusterFS
Version: mainline
Component: libgfapi
Keywords: Triaged
Severity: medium
Assignee: bugs at gluster.org
Reporter: skoduri at redhat.com
QA Contact: sdharane at redhat.com
CC: bugs at gluster.org, ndevos at redhat.com,
pgurusid at redhat.com, ppai at redhat.com,
sdharane at redhat.com, skoduri at redhat.com,
thiago at redhat.com
Depends On: 1362540
+++ This bug was initially created as a clone of Bug #1362540 +++
Description of problem:
I was trying to benchmark libgfapi-python bindings for filesystem walk
(metadata intensive) workload to do comparison against FUSE for same workload.
The program crashes during virtual unmount (fini).
Setup details:
[root at f24 ~]# rpm -qa | grep gluster
glusterfs-3.8.1-1.fc24.x86_64
python-gluster-3.8.1-1.fc24.noarch
glusterfs-libs-3.8.1-1.fc24.x86_64
glusterfs-client-xlators-3.8.1-1.fc24.x86_64
glusterfs-cli-3.8.1-1.fc24.x86_64
glusterfs-fuse-3.8.1-1.fc24.x86_64
glusterfs-server-3.8.1-1.fc24.x86_64
glusterfs-api-3.8.1-1.fc24.x86_64
glusterfs-debuginfo-3.8.1-1.fc24.x86_64
[root at f24 ~]# uname -a
Linux f24 4.6.4-301.fc24.x86_64 #1 SMP Tue Jul 12 11:50:00 UTC 2016 x86_64
x86_64 x86_64 GNU/Linux
[root at f24]# gluster volume info
Volume Name: test
Type: Distributed-Replicate
Volume ID: e675d53a-c9b6-468f-bb8f-7101828bec70
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: f24:/export/brick1/data
Brick2: f24:/export/brick2/data
Brick3: f24:/export/brick3/data
Brick4: f24:/export/brick4/data
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
NOTE: The volume is created with 4 RAM disks (1GB each) as bricks.
[root at f24 ~]# df -h | grep 'ram\|Filesystem'
Filesystem Size Used Avail Use% Mounted on
/dev/ram1 971M 272M 700M 28% /export/brick1
/dev/ram2 971M 272M 700M 28% /export/brick2
/dev/ram3 971M 272M 700M 28% /export/brick3
/dev/ram4 971M 272M 700M 28% /export/brick4
[root at f24 ~]# df -ih | grep 'ram\|Filesystem'
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/ram1 489K 349K 140K 72% /export/brick1
/dev/ram2 489K 349K 140K 72% /export/brick2
/dev/ram3 489K 349K 140K 72% /export/brick3
/dev/ram4 489K 349K 140K 72% /export/brick4
How reproducible:
Always and consistently (at least on my fedora 24 test VM)
Steps to Reproduce:
1. Create large nested directory tree using FUSE mount.
2. Unmount FUSE mount. This is just to generate initial data.
3. Use python-libgfapi bindings with patch
(http://review.gluster.org/#/c/14770/). It's reproducible without this patch
too but the patch makes the crawling faster.
4. Run the python script that does walk using libgfapi
[root at f24 ~]# ./reproducer.py
Segmentation fault (core dumped)
There *could* be a simple reproducer too but I haven't had the time to look
further into it.
Excerpt from bt:
#0 list_add_tail (head=0x90, new=0x7fcac00040b8) at list.h:41
41 new->prev = head->prev;
[Current thread is 1 (Thread 0x7fcae3dbb700 (LWP 4165))]
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.23.1-8.fc24.x86_64 keyutils-libs-1.5.9-8.fc24.x86_64
krb5-libs-1.14.1-8.fc24.x86_64 libacl-2.2.52-11.fc24.x86_64
libattr-2.4.47-16.fc24.x86_64 libcom_err-1.42.13-4.fc24.x86_64
libffi-3.1-9.fc24.x86_64 libselinux-2.5-9.fc24.x86_64
libuuid-2.28-3.fc24.x86_64 openssl-libs-1.0.2h-1.fc24.x86_64
pcre-8.39-2.fc24.x86_64 zlib-1.2.8-10.fc24.x86_64
(gdb) bt
#0 list_add_tail (head=0x90, new=0x7fcac00040b8) at list.h:41
#1 list_move_tail (head=0x90, list=0x7fcac00040b8) at list.h:107
#2 __inode_retire (inode=0x7fcac0004030) at inode.c:439
#3 0x00007fcad764100e in inode_table_prune (table=table at entry=0x7fcac0004040)
at inode.c:1521
#4 0x00007fcad7642f22 in inode_table_destroy (inode_table=0x7fcac0004040) at
inode.c:1808
#5 0x00007fcad7642fee in inode_table_destroy_all
(ctx=ctx at entry=0x55f82360b430) at inode.c:1733
#6 0x00007fcad7f3fde6 in pub_glfs_fini (fs=0x55f823537950) at glfs.c:1204
Expected results:
No crash
Additional_info:
[root at f24 ~]# rpm -qa | grep glusterfs-debuginfo
glusterfs-debuginfo-3.8.1-1.fc24.x86_64
[root at f24 ~]# rpm -qa | grep python-debuginfo
python-debuginfo-2.7.12-1.fc24.x86_64
[root at f24 coredump]# ls -lh
total 311M
-rw-r--r--. 1 root root 350M Aug 2 17:31
core.python.0.21e34182be6844658e00bba43a55dfa0.4165.1470139182000000000000
-rw-r-----. 1 root root 22M Aug 2 17:29
core.python.0.21e34182be6844658e00bba43a55dfa0.4165.1470139182000000000000.lz4
Can provide the coredump offline.
--- Additional comment from Prashanth Pai on 2016-08-02 09:24 EDT ---
--- Additional comment from Prashanth Pai on 2016-08-02 09:35:30 EDT ---
Also, I can't seem to reproduce this when the test filesystem tree is small
enough.
--- Additional comment from Soumya Koduri on 2016-08-04 06:14:55 EDT ---
I suspect below could have caused the issue -
In inode_table_destroy(), we first purge all the lru entries but the lru count
is not adjusted accordingly. So when inode_table_prune() is called in case if
the lru count was larger than lru limit (as can be seen in the core), we shall
end up accessing invalid memory.
(gdb) f 3
#3 0x00007fcad764100e in inode_table_prune (table=table at entry=0x7fcac0004040)
at inode.c:1521
1521 __inode_retire (entry);
(gdb) p table->lru_size
$4 = 132396
(gdb) p table->lru_limit
$5 = 131072
(gdb) p table->lru
$6 = {next = 0x90, prev = 0xcafecafe}
(gdb) p &&table->lru
A syntax error in expression, near `&&table->lru'.
(gdb) p &table->lru
$7 = (struct list_head *) 0x7fcac00040b8
(gdb)
I will send a fix for it.
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1362540
[Bug 1362540] glfs_fini() crashes with SIGSEGV
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list