[Bugs] [Bug 1659563] New: gluster-blockd segfaults because of a null-dereference in shard.so

Fri Dec 14 16:47:30 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1659563

            Bug ID: 1659563
           Summary: gluster-blockd segfaults because of a null-dereference
                    in shard.so
           Product: GlusterFS
           Version: 5
            Status: ASSIGNED
         Component: sharding
          Severity: urgent
          Priority: urgent
          Assignee: ndevos at redhat.com
          Reporter: ndevos at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community

Description of problem:
Heketi tests have started to fail with the Gluster 5 release. It seems that
gluster-blockd occasionally gets a segfault and will not handle further
requests anymore.

Version-Release number of selected component (if applicable):
glusterfs-5.1-1.el7.x86_64

How reproducible:
random, but very often

Steps to Reproduce:
1. Run the functional tests that are part of heketi
2. git clone github.com/heketi/heketi
3. cd heketi
4. make test-functional

Actual results:
tests fail, logs contain references that communicating with gluster-blockd
failed.

Expected results:
Tests should pass

Additional info:

[root at storage2 ~]# systemctl status gluster-blockd
● gluster-blockd.service - Gluster block storage utility
   Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled;
vendor preset: disabled)
   Active: failed (Result: signal) since Fri 2018-12-14 15:42:16 UTC; 7min ago 
  Process: 7246 ExecStart=/usr/sbin/gluster-blockd --glfs-lru-count
$GB_GLFS_LRU_COUNT --log-level $GB_LOG_LEVEL $GB_EXTRA_ARGS (code=killed,
signal=SEGV)
 Main PID: 7246 (code=killed, signal=SEGV)

Dec 14 15:41:40 storage2 systemd[1]: Started Gluster block storage utility.
Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter logfile is now
'/var/log/gluster-block/gluster-block-configshell.log'.
Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter loglevel_file is now
'info'.
Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter auto_enable_tpgt is
now 'false'.
Dec 14 15:41:41 storage2 gluster-blockd[7246]: Parameter
auto_add_default_portal is now 'false'.
Dec 14 15:41:41 storage2 gluster-blockd[7246]: Configuration saved to
/etc/target/saveconfig.json
Dec 14 15:42:16 storage2 systemd[1]: gluster-blockd.service: main process
exited, code=killed, status=11/SEGV
Dec 14 15:42:16 storage2 systemd[1]: Unit gluster-blockd.service entered failed
state.
Dec 14 15:42:16 storage2 systemd[1]: gluster-blockd.service failed.

[root at storage2 ~]# dmesg | grep segf
[  143.199235] glfs_epoll000[7847]: segfault at f0 ip 00007fe5b3ddc9b9 sp
00007fe5beaa6440 error 6 in shard.so[7fe5b3dd3000+2b000]

Core was generated by `/usr/sbin/gluster-blockd --glfs-lru-count 5 --log-level
INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fbb9cd639b9 in shard_unlink_block_inode
(local=local at entry=0x7fbb80000a78, shard_block_num=<optimized out>) at
shard.c:2929
2929                base_ictx->fsync_count--;
(gdb) l
2924            if (ctx->fsync_needed) {
2925                unref_base_inode++;
2926                list_del_init(&ctx->to_fsync_list);
2927                if (base_inode)
2928                    __shard_inode_ctx_get(base_inode, this, &base_ictx);
2929                base_ictx->fsync_count--;
2930            }       
2931        }       
2932        UNLOCK(&inode->lock);
2933        if (base_inode)
(gdb) p *base_ictx 
Cannot access memory at address 0x0

The problem has been introduced by commit
https://github.com/gluster/glusterfs/commit/02a05da6989f and was fixed only in
the master branch with https://github.com/gluster/glusterfs/commit/145e1805 .
The 2nd commit will need to be backported to the release-5 branch of glusterfs.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.