[Bugs] [Bug 1466863] New: dht_rename_lock_cbk crashes in upstream regression test

bugzilla at redhat.com bugzilla at redhat.com
Fri Jun 30 15:11:35 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1466863

            Bug ID: 1466863
           Summary: dht_rename_lock_cbk crashes in upstream regression
                    test
           Product: GlusterFS
           Version: 3.10
         Component: distribute
          Assignee: bugs at gluster.org
          Reporter: nbalacha at redhat.com
                CC: bugs at gluster.org, jeff at pl.atyp.us, nbalacha at redhat.com
        Depends On: 1466110
            Blocks: 1466321, 1466859



+++ This bug was initially created as a clone of Bug #1466110 +++

Description of problem:

In https://build.gluster.org/job/centos6-regression/5186/console

./tests/bugs/distribute/bug-1117851.t: 1 new core files


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Nithya Balachandran on 2017-06-29 01:15:05 EDT ---

Thanks to Jeff Darcy for debugging this:

Core was generated by `glusterfs --entry-timeout=0 --attribute-timeout=0 -s
slave1.cloud.gluster.org -'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130,
cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581
1581                    STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void
*)(long)i,


(gdb) bt
#0  0x00007f00df0dfbb1 in dht_rename_lock_cbk (frame=0x7f00d80ea130,
cookie=0x0, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-rename.c:1581
#1  0x00007f00df1496c3 in dht_inodelk_done (lock_frame=0x7f00d80f9690) at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:684
#2  0x00007f00df14b073 in dht_blocking_inodelk_cbk (frame=0x7f00d80f9690,
cookie=0x1, this=0x7f00d801bba0, op_ret=0, op_errno=0, xdata=0x0)
    at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-lock.c:1066
#3  0x00007f00df3e17ce in afr_fop_lock_unwind (frame=0x7f00d0056f10,
op=GF_FOP_INODELK, op_ret=0, op_errno=0, xdata=0x0)
    at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3557
#4  0x00007f00df3e3ca4 in afr_fop_lock_done (frame=0x7f00d0056f10,
this=0x7f00d801a800)
    at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3831
#5  0x00007f00df3e4050 in afr_parallel_lock_cbk (frame=0x7f00d0056f10,
cookie=0x1, this=0x7f00d801a800, op_ret=0, op_errno=0, xdata=0x0)
    at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-common.c:3923
#6  0x00007f00df636ece in client3_3_inodelk_cbk (req=0x7f00d00877c0,
iov=0x7f00d0087800, count=1, myframe=0x7f00d00749c0)
    at
/home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:1510
#7  0x00007f00ec4f584d in rpc_clnt_handle_reply (clnt=0x7f00d806aa30,
pollin=0x7f00d0075490)
    at
/home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:778
#8  0x00007f00ec4f5e17 in rpc_clnt_notify (trans=0x7f00d806ac60,
mydata=0x7f00d806aa60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490)
    at
/home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:971
#9  0x00007f00ec4f1dac in rpc_transport_notify (this=0x7f00d806ac60,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f00d0075490)
    at
/home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-transport.c:538
#10 0x00007f00e1aa456a in socket_event_poll_in (this=0x7f00d806ac60,
notify_handled=_gf_true)
    at
/home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2315
#11 0x00007f00e1aa4bb5 in socket_event_handler (fd=10, idx=1, gen=10,
data=0x7f00d806ac60, poll_in=1, poll_out=0, poll_err=0)
    at
/home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2467
#12 0x00007f00ec7a153a in event_dispatch_epoll_handler (event_pool=0x23bd050,
event=0x7f00dd147e70)
    at
/home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:572
#13 0x00007f00ec7a183c in event_dispatch_epoll_worker (data=0x7f00d806a770) at
/home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:648
#14 0x00007f00eba08aa1 in start_thread () from ./lib64/libpthread.so.0
#15 0x00007f00eb370bcd in clone () from ./lib64/libc.so.6
(gdb) l
1576             * do a gfid based resolution). So once a lock is granted, make
sure the file
1577             * exists with the name that the client requested with.
1578             * */
1579    
1580            for (i = 0; i < local->lock[0].layout.parent_layout.lk_count;
i++) {
1581                    STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void
*)(long)i,
1582                                      
local->lock[0].layout.parent_layout.locks[i]->xl,
1583                                      
local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup,
1584                                       ((gf_uuid_compare (local->loc.gfid,
\
1585                                        
local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ?
(gdb) p frame
$1 = (call_frame_t *) 0x7f00d80ea130
(gdb) p *frame
$2 = {root = 0x7f00deadc0de00, parent = 0x7f00d80382c000, frames = {next =
0x7f000000003000, prev = 0xe000}, local = 0x7f00d80ea13000, this =
0x7f00deadc0de00, 
  ret = 0x7f00d80382c000, ref_count = 12288, lock = {spinlock = 57344, mutex =
{__data = {__lock = 57344, __count = 0, __owner = 245444608, __nusers =
8323288, 
        __kind = -1379869184, __spins = 8323294, __list = {__prev =
0x7f00d80382c000, __next = 0x7f000000003000}}, 
      __size =
"\000\340\000\000\000\000\000\000\000\060\241\016\330\000\177\000\000\336\300\255\336\000\177\000\000\300\202\003\330\000\177\000\000\060\000\000\000\000\177",
__align = 57344}}, cookie = 0xe000, complete = (unknown: 245444608), op =
8323288, begin = {tv_sec = 35748278440091136, tv_usec = 35748249814089728}, 
  end = {tv_sec = 35747322042265600, tv_usec = 57344}, wind_from =
0x7f00d80ea13000 <Address 0x7f00d80ea13000 out of bounds>, 
  wind_to = 0x7f00deadc0de00 <Address 0x7f00deadc0de00 out of bounds>,
unwind_from = 0x7f00d80382c000 <Address 0x7f00d80382c000 out of bounds>, 
  unwind_to = 0x7f000000003000 <Address 0x7f000000003000 out of bounds>}





(gdb) l
1576             * do a gfid based resolution). So once a lock is granted, make
sure the file
1577             * exists with the name that the client requested with.
1578             * */
1579    
1580            for (i = 0; i < local->lock[0].layout.parent_layout.lk_count;
i++) {
1581                    STACK_WIND_COOKIE (frame, dht_rename_lookup_cbk, (void
*)(long)i,
1582                                      
local->lock[0].layout.parent_layout.locks[i]->xl,
1583                                      
local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup,
1584                                       ((gf_uuid_compare (local->loc.gfid,
\
1585                                        
local->lock[0].layout.parent_layout.locks[i]->loc.gfid) == 0) ?
(gdb) p local->lock[0].layout.parent_layout.lk_count
$1 = 79368192
(gdb) p local->loc.gfid
$2 = "\000\336\300\255\336\000\177\000\000\020\273\004\330\000\177"
(gdb) p local->lock[0].layout.parent_layout.locks[i]->loc.gfid
Cannot access memory at address 0x7f00deadc0de10
(gdb) p local->lock[0].layout.parent_layout.locks[i]->xl->fops->lookup
Cannot access memory at address 0x7f00deadc0de10
(gdb) p local->lock[0].layout.parent_layout.locks[i]->xl
Cannot access memory at address 0x7f00deadc0de10

Both frame and local have obviously been freed.

The issue here is that the for loop uses a member of the local variable to
check the limits. This is unsafe as the frames could have been released at this
point.

--- Additional comment from Worker Ant on 2017-06-29 01:24:38 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in
dht_rename_lock_cbk) posted (#1) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Worker Ant on 2017-06-29 03:08:42 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in
dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim

--- Additional comment from Worker Ant on 2017-06-29 03:34:54 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in
dht_rename_lock_cbk) posted (#2) for review on master by Ji-Hyeon Gim
(potatogim at potatogim.net)

--- Additional comment from Worker Ant on 2017-06-29 04:11:20 EDT ---

REVIEW: https://review.gluster.org/17645 (cluster:dht Fix crash in
dht_rename_lock_cbk) posted (#3) for review on master by Nigel Babu
(nigelb at redhat.com)

--- Additional comment from Worker Ant on 2017-06-29 13:09:47 EDT ---

COMMIT: https://review.gluster.org/17645 committed in master by Shyamsundar
Ranganathan (srangana at redhat.com) 
------
commit 56da27cf5dc6ef54c7fa5282dedd6700d35a0ab0
Author: N Balachandran <nbalacha at redhat.com>
Date:   Thu Jun 29 10:52:37 2017 +0530

    cluster:dht Fix crash in dht_rename_lock_cbk

    Use a local variable to store the call count
    in the STACK_WIND for loop. Using frame->local
    is dangerous as it could be freed while the loop
    is still being processed

    Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e
    BUG: 1466110
    Signed-off-by: N Balachandran <nbalacha at redhat.com>
    Reviewed-on: https://review.gluster.org/17645
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Tested-by: Nigel Babu <nigelb at redhat.com>
    Reviewed-by: Amar Tumballi <amarts at redhat.com>
    Reviewed-by: Jeff Darcy <jeff at pl.atyp.us>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1466110
[Bug 1466110] dht_rename_lock_cbk crashes in upstream regression test
https://bugzilla.redhat.com/show_bug.cgi?id=1466321
[Bug 1466321] dht_rename_lock_cbk crashes in upstream regression test
https://bugzilla.redhat.com/show_bug.cgi?id=1466859
[Bug 1466859] dht_rename_lock_cbk crashes in upstream regression test
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list