[Bugs] [Bug 1484709] [geo-rep+qr]: Crashes observed at slave from qr_lookup_sbk during rename/hardlink/ rebalance cases

Tue Aug 29 14:48:50 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1484709

Nithya Balachandran <nbalacha at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|quick-read                  |distribute
           Assignee|nbalacha at redhat.com         |bugs at gluster.org

--- Comment #4 from Nithya Balachandran <nbalacha at redhat.com> ---
RCA:

(gdb) bt
#0  0x00007f2152cebaeb in __memcpy_sse2 () from /lib64/libc.so.6
#1  0x00007f2145b9b824 in memcpy (__len=<optimized out>, __src=<optimized out>,
__dest=<optimized out>) at /usr/include/bits/string3.h:51
#2  qr_content_extract (xdata=xdata at entry=0x7f21340658c0) at quick-read.c:279
#3  0x00007f2145b9bdaa in qr_lookup_cbk (frame=0x7f212c05fff0,
cookie=<optimized out>, this=0x7f214001e4b0, op_ret=0, op_errno=117,
inode_ret=0x7f212c05bb60, 
    buf=0x7f212c002b58, xdata=0x7f21340658c0, postparent=0x7f212c002d88) at
quick-read.c:423
#4  0x00007f2145da7de7 in ioc_lookup_cbk (frame=0x7f212c057f00,
cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
op_errno=<optimized out>, 
    inode=0x7f212c05bb60, stbuf=0x7f212c002b58, xdata=0x7f21340658c0,
postparent=0x7f212c002d88) at io-cache.c:267
#5  0x00007f2146614627 in dht_discover_complete
(this=this at entry=0x7f2140016f40,
discover_frame=discover_frame at entry=0x7f212c061500) at dht-common.c:577
#6  0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500,
cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>,
op_errno=117, 
    inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0,
postparent=0x7f212c05e6b8) at dht-common.c:700
#7  0x00007f21468c4d10 in afr_discover_done (this=<optimized out>,
frame=0x7f212c0546c0) at afr-common.c:2624
#8  afr_discover_cbk (frame=frame at entry=0x7f212c0546c0, cookie=<optimized out>,
this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, 
    inode=inode at entry=0x7f212c05bb60, buf=buf at entry=0x7f2147537920,
xdata=0x7f21400adca0, postparent=postparent at entry=0x7f2147537990) at
afr-common.c:2669
#9  0x00007f2146b012c7 in client3_3_lookup_cbk (req=<optimized out>,
iov=<optimized out>, count=<optimized out>, myframe=0x7f212c061380)
    at client-rpc-fops.c:2947
#10 0x00007f21543f2840 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f2140080400, pollin=pollin at entry=0x7f21400ee830) at
rpc-clnt.c:794
#11 0x00007f21543f2b27 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f2140080430, event=<optimized out>, data=0x7f21400ee830) at
rpc-clnt.c:987
#12 0x00007f21543ee9e3 in rpc_transport_notify (this=this at entry=0x7f21400805a0,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f21400ee830)
    at rpc-transport.c:538
#13 0x00007f2148fef3d6 in socket_event_poll_in (this=this at entry=0x7f21400805a0,
notify_handled=<optimized out>) at socket.c:2306
#14 0x00007f2148ff197c in socket_event_handler (fd=13, idx=3, gen=1,
data=0x7f21400805a0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2458
#15 0x00007f21546841e6 in event_dispatch_epoll_handler (event=0x7f2147537e80,
event_pool=0x56504ff3cee0) at event-epoll.c:572
#16 event_dispatch_epoll_worker (data=0x56504ff95d60) at event-epoll.c:648
#17 0x00007f2153488e25 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f2152d5534d in clone () from /lib64/libc.so.6

(gdb) f 6
#6  0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500,
cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>,
op_errno=117, 
    inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0,
postparent=0x7f212c05e6b8) at dht-common.c:700
700                    dht_discover_complete (this, frame);
(gdb) l
695             * which calls STACK_DESTROY (frame). In the case of
"attempt_unwind",
696             * this makes sure that the thread don't call dht_frame_return,
till
697             * call to dht_discover_complete is done.
698             */
699            if (attempt_unwind) {
700                    dht_discover_complete (this, frame);  <--- Here
701            }
702    
703            this_call_cnt = dht_frame_return (frame);
704

dht_discover_cbk has found a file so it tries to unwind before all subvols have
responded. So it is possible that other subvols have not yet responded.

Checking the frame for dht_discover_cbk:

gdb) f 6
#6  0x00007f214661534a in dht_discover_cbk (frame=0x7f212c061500,
cookie=0x7f21400135b0, this=0x7f2140016f40, op_ret=<optimized out>,
op_errno=117, 
    inode=0x7f212c05bb60, stbuf=0x7f212c05e648, xattr=0x7f21340658c0,
postparent=0x7f212c05e6b8) at dht-common.c:700
700                    dht_discover_complete (this, frame);
(gdb) p local->file_count
$6 = 3

Strangely a file with this gfid was found on _all_ 3 children. As
local->file_count doe not distinguish between data files or a linkto files,it
seems like there was a stale (?) linkto file on at least one subvol.

So at this point all 3 subvols have at least unwound to dht_discover_cbk. 

(gdb) p local->dir_count
$7 = 0

(gdb) p is_linkfile
$8 = <optimized out>
(gdb) p local->cached_subvol
$9 = (xlator_t *) 0x7f2140014d20
(gdb) p local->hashed_subvol
$10 = (xlator_t *) 0x0
(gdb) p *local->cached_subvol
$11 = {name = 0x7f21400158c0 "Slave-replicate-1", type = 0x7f2140015a40
"cluster/replicate", instance_name = 0x0, next = 0x7f21400135b0, prev =
0x7f2140015e30, 
  parents = 0x7f2140018630, children = 0x7f2140015d10, options =
0x7f2140015910, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>, 
  cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>,
volume_options = {next = 0x7f2140015b40, prev = 0x7f2140015b40}, 
  fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure =
0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>, 
  notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min =
0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, 
  history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable =
0x7f214005e0a0, init_succeeded = 1 '\001', private = 0x7f214005db00, 
  mem_acct = 0x7f214005a0b0, winds = 0, switched = 0 '\000', local_pool =
0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0}
(gdb) p *prev
$12 = {name = 0x7f2140012740 "Slave-replicate-0", type = 0x7f2140014260
"cluster/replicate", instance_name = 0x0, next = 0x7f21400118a0, prev =
0x7f2140014d20, 
  parents = 0x7f2140018480, children = 0x7f2140014c00, options =
0x7f2140014130, dlhandle = 0x7f2140014320, fops = 0x7f2146ad91a0 <fops>, 
  cbks = 0x7f2146ad90e0 <cbks>, dumpops = 0x7f2146ad9140 <dumpops>,
volume_options = {next = 0x7f2140014a30, prev = 0x7f2140014a30}, 
  fini = 0x7f21468c6ac0 <fini>, init = 0x7f21468c5e10 <init>, reconfigure =
0x7f21468c5740 <reconfigure>, mem_acct_init = 0x7f21468c5670 <mem_acct_init>, 
  notify = 0x7f21468c5610 <notify>, loglevel = GF_LOG_NONE, latencies = {{min =
0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, 
  history = 0x0, ctx = 0x56504ff06010, graph = 0x7f2140003f30, itable =
0x7f2140065080, init_succeeded = 1 '\001', private = 0x7f2140064ae0, 
  mem_acct = 0x7f2140061090, winds = 0, switched = 0 '\000', local_pool =
0x7f21548f2ab8 <pools+280>, is_autoloaded = _gf_false, volfile_id = 0x0}

As the local->cached subvol does not match prev, it is clear that
local->cached_subvol was updated after this thread proceeded to unwind to the
higher xlators. This means a data file was found on at least 2 subvols - this
could be because a linkto file was in the process of being created when the
lookup hit the brick.

In dht_discover_complete () :

        local = discover_frame->local;   
        layout = local->layout;
        conf = this->private;

        LOCK(&discover_frame->lock);
        {
                main_frame = local->main_frame;
                local->main_frame = NULL;
        }
        UNLOCK(&discover_frame->lock);

        if (!main_frame)
                return 0;    <--- So far so good - only one thread will unwind
here.

However, dicover_frame->local is valid and _still_ accessible to other threads
unwinding to dht_discover_cbk.

In dht_discover_cbk:
...

        local = frame->local;

...
        LOCK (&frame->lock);
        {

...
                if (is_dir) {
                        local->dir_count ++;
                } else {
                        local->file_count ++;

                        if (!is_linkfile) {
                                /* real file */
                                local->cached_subvol = prev;
                                attempt_unwind = 1;
                        } else {
                                goto unlock;
                        }
                }

...

                if (local->xattr == NULL) {
                        local->xattr = dict_ref (xattr); <-- say thread 1 (the
one that crashed )did this
                } else {
                        dht_aggregate_xattr (local->xattr, xattr);   <---  and
thread 2 called this (this should _only_ be done for directories.)
                }

In dht_aggregate_xattr () -> dht_aggregate ():

        ret = dict_set (dst, key, value);
        if (ret) {
                gf_msg ("dht", GF_LOG_WARNING, 0,
                        DHT_MSG_DICT_SET_FAILED,
                        "Failed to set dictionary value: key = %s",
                        key);
        }

dict_set _replaces_ the key which unrefs the old data, causing its ref-count to
become 0 and hence destroying it. 

So here is probably what happened:

Time:    subvol-0                                   subvol-1

t0      dht_discover_cbk
        local->xattr = dict_ref (dict0);
        dict_set (dict0, GF_CONTENT_KEY, data0)
    attempt_unwind = 1

t1     dht_discover_complete

t2     qr_content_extract gets data0

t3                                                 dht_discover_cbk
                                                   dht_aggregate_xattr
(local->xattr, xattr)
                             |->dict_set (dict0, GF_CONTENT_KEY, data1)  
                             |-><-- data_unref (data0)  <-- As ref-count = 0,
data0 is destroyed
                                               attempt_unwind = 1

t3    qr_content_extract accesses data0 and crashes

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.