[Bugs] [Bug 1140549] New: DHT: Rebalance process crash after add-brick and `rebalance start' operation

Thu Sep 11 08:35:54 UTC 2014

https://bugzilla.redhat.com/show_bug.cgi?id=1140549

            Bug ID: 1140549
           Summary: DHT: Rebalance process crash after add-brick and
                    `rebalance start' operation
           Product: GlusterFS
           Version: 3.5.2
         Component: distribute
          Assignee: gluster-bugs at redhat.com
          Reporter: nbalacha at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com,
                    saujain at redhat.com, sdharane at redhat.com,
                    srangana at redhat.com, ssamanta at redhat.com,
                    surs at redhat.com, vbellur at redhat.com
        Depends On: 1139812
            Blocks: 1139213

+++ This bug was initially created as a clone of Bug #1139812 +++

+++ This bug was initially created as a clone of Bug #1139213 +++

Description of problem:

rebalance process crashed upon adding a brick and starting the volume.

How reproducible:

Always.

Steps to Reproduce:
1. Run multiple rename operations on the mount point.
2. Add a replica pair to the volume.
3. Start rebalance on the volume.

Actual results:

Rebalance fails and process crashes.

Additional info:

[2014-09-08 10:27:52.734274] E [MSGID: 109019] [dht-common.c:2780:dht_getxattr]
0-t0-dht: Layout is NULL
[2014-09-08 10:27:52.734300] E [MSGID: 109023]
[dht-rebalance.c:1520:gf_defrag_migrate_data] 0-t0-dht: M
igrate file failed:Failed to get node-uuid for
/scratch/scratch/renameracegvJKY2
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-09-08 10:27:52
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.28
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x349241ff06]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x349243a59f]
/lib64/libc.so.6[0x3392e329a0]
/usr/lib64/glusterfs/3.6.0.28/xlator/protocol/client.so(client3_3_lookup_cbk+0x89)[0x7f67f0bd2ca9]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x3492c0e9c5]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)[0x3492c0fe4f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x3492c0b668]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0x9275)[0x7f67f1015275]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0xac5d)[0x7f67f1016c5d]
/usr/lib64/libglusterfs.so.0[0x3492476367]
/usr/sbin/glusterfs(main+0x603)[0x407e93]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3392e1ed1d]
/usr/sbin/glusterfs[0x4049a9]

=======

(gdb) p *frame
$1 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67fa09c1c4, 
  next = 0x7f67fa09ba60, 
  prev = 0x7f67f9ed050c, 
  local = 0x0, 
  this = 0x22020b0, 
  ret = 0x7f67f0993ee0 <afr_lookup_cbk>, 
  ref_count = 0, 
  lock = 1, 
  cookie = 0x1, 
  complete = _gf_true, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x7f67f099f3d1 "afr_lookup", 
  wind_to = 0x7f67f0998528 "priv->children[i]->fops->lookup", 
  unwind_from = 0x7f67f0be6800 "client3_3_lookup_cbk", 
  unwind_to = 0x7f67f099d3a7 "afr_lookup_cbk"
}

(gdb) p *frame->next
$2 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67fa09c1c4, 
  next = 0x7f67fa09c1c4, 
  prev = 0x7f67fa09b454, 
  local = 0x0, 
  this = 0x2201180, 
  ret = 0x7f67f0993ee0 <afr_lookup_cbk>, 
  ref_count = 0, 
  lock = 1, 
  cookie = 0x0, 
  complete = _gf_true, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x7f67f099f3d1 "afr_lookup", 
  wind_to = 0x7f67f0998528 "priv->children[i]->fops->lookup", 
  unwind_from = 0x7f67f0be6800 "client3_3_lookup_cbk", 
  unwind_to = 0x7f67f099d3a7 "afr_lookup_cbk"
}

(gdb) p *frame->next->next
$3 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67fa09b3a8, 
  next = 0x7f67fa09b3a8, 
  prev = 0x7f67fa09ba60, 
  local = 0x7f67eb18c04c, 
  this = 0x2204ee0, 
  ret = 0x7f67f0711c20 <dht_lookup_cbk>, 
  ref_count = 0, 
  lock = 1, 
  cookie = 0x7f67fa09c1c4, 
  complete = _gf_false, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x7f67f072ac67 "dht_lookup", 
  wind_to = 0x7f67f0728a5a "hashed_subvol->fops->lookup", 
  unwind_from = 0x0, 
  unwind_to = 0x7f67f0728a76 "dht_lookup_cbk"
}

(gdb) p *frame->next->next->next
$4 = {
  root = 0x7f67f9ecfea4, 
  parent = 0x7f67f9ed050c, 
  next = 0x0, 
  prev = 0x7f67fa09c1c4, 
  local = 0x7f67eb6b304c, 
  this = 0x2205b30, 
  ret = 0x349245d0e0 <syncop_lookup_cbk>, 
  ref_count = 1, 
  lock = 1, 
  cookie = 0x7f67e9c12d10, 
  complete = _gf_false, 
  op = GF_FOP_NULL, 
  begin = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  end = {
    tv_sec = 0, 
    tv_usec = 0
  }, 
  wind_from = 0x3492483794 "syncop_lookup", 
  wind_to = 0x349248310a "subvol->fops->lookup", 
  unwind_from = 0x0, 
  unwind_to = 0x349248311f "syncop_lookup_cbk"
}

(gdb) bt
#0  client3_3_lookup_cbk (req=0x7f67ea660298, iov=<value optimized out>,
count=<value optimized out>, 
    myframe=0x7f67fa09b454) at client-rpc-fops.c:2710
#1  0x0000003492c0e9c5 in rpc_clnt_handle_reply (clnt=0x22ecd70,
pollin=0x21f6360) at rpc-clnt.c:763
#2  0x0000003492c0fe4f in rpc_clnt_notify (trans=<value optimized out>,
mydata=0x22ecda0, 
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:891
#3  0x0000003492c0b668 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>) at
rpc-transport.c:512
#4  0x00007f67f1015275 in socket_event_poll_in (this=0x22fc900) at
socket.c:2127
#5  0x00007f67f1016c5d in socket_event_handler (fd=<value optimized out>,
idx=<value optimized out>, 
    data=0x22fc900, poll_in=1, poll_out=0, poll_err=0) at socket.c:2240
#6  0x0000003492476367 in event_dispatch_epoll_handler (event_pool=0x21d5d00)
at event-epoll.c:384
#7  event_dispatch_epoll (event_pool=0x21d5d00) at event-epoll.c:445
#8  0x0000000000407e93 in main (argc=31, argv=0x7fff82c05998) at
glusterfsd.c:2023
(gdb)

--- Additional comment from Saurabh on 2014-09-09 07:20:45 EDT ---

I also found the same issue while executing remove-brick operation and no other
operation executing on the mount-point.

[root at nfs4 ~]# gluster volume remove-brick dist-rep
10.70.37.156:/bricks/d1r1-add3 10.70.37.81:/bricks/d1r2-add3 status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                            10.70.37.156                0        0Bytes        
    0             0             0               failed               0.00
                             10.70.37.81                0        0Bytes        
    0             0             0               failed               0.00

(gdb) bt
#0  client3_3_lookup_cbk (req=0x7f07451fb4e4, iov=<value optimized out>,
count=<value optimized out>, myframe=0x7f075635bd50)
    at client-rpc-fops.c:2710
#1  0x000000391ca0e9c5 in rpc_clnt_handle_reply (clnt=0x2543ea0,
pollin=0x270dec0) at rpc-clnt.c:763
#2  0x000000391ca0fe4f in rpc_clnt_notify (trans=<value optimized out>,
mydata=0x2543ed0, event=<value optimized out>, data=<value optimized out>)
    at rpc-clnt.c:891
#3  0x000000391ca0b668 in rpc_transport_notify (this=<value optimized out>,
event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:512
#4  0x00007f074d2d3275 in socket_event_poll_in (this=0x2553aa0) at
socket.c:2127
#5  0x00007f074d2d4c5d in socket_event_handler (fd=<value optimized out>,
idx=<value optimized out>, data=0x2553aa0, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2240
#6  0x000000391c276367 in event_dispatch_epoll_handler (event_pool=0x22c04d0)
at event-epoll.c:384
#7  event_dispatch_epoll (event_pool=0x22c04d0) at event-epoll.c:445
#8  0x0000000000407e93 in main (argc=31, argv=0x7ffffe0a22e8) at
glusterfsd.c:2023

--- Additional comment from Shyamsundar on 2014-09-09 12:49:23 EDT ---

Root caused this as follows:

In dht_lookup_everywhere_done

Line: 1194 we call DHT_STACK_UNWIND
and in the same if condition we go ahead and call, goto
unwind_hashed_and_cached; which at Line 1371 calls another UNWIND.

Pointers to the problem:
- If you see in the cores, the ref count for the frames become -1 which
means double dereference of sorts.

>From (gdb) f 2  if the frame, frame->parent, frame->parent->parent is
inspected, we can see this ref count.

- In all the cores, the statement that caused the UNWIND is occuring at
"unwind_hashed_and_cached:" at dht_lookup_everywhere_done, and we can
get here only from that loop as mentioned above.

So the problem is a double UNWIND in a particular code path that causes this
core dump.

BTW, bug #1139535 has the same root cause.

--- Additional comment from Shyamsundar on 2014-09-09 13:00:57 EDT ---

Another symptom of the same issue:
==================================

Description of problem:
I tried to resolve the gfid mismatch for a file and then did "ls" for that same
file. The ls command hung and nfs server reported crash.

gluster volume info,
[root at nfs4 ~]# gluster volume info dist-rep

Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 9a80c514-ddd4-45cb-b2b6-d745fd438db8
Status: Started
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.156:/bricks/d1r1
Brick2: 10.70.37.81:/bricks/d1r2
Brick3: 10.70.37.50:/bricks/d2r1
Brick4: 10.70.37.95:/bricks/d2r2
Brick5: 10.70.37.156:/bricks/d3r1
Brick6: 10.70.37.81:/bricks/d3r2
Brick7: 10.70.37.50:/bricks/d4r1
Brick8: 10.70.37.95:/bricks/d4r2
Brick9: 10.70.37.156:/bricks/d5r1
Brick10: 10.70.37.81:/bricks/d5r2
Brick11: 10.70.37.50:/bricks/d6r1
Brick12: 10.70.37.95:/bricks/d6r2
Brick13: 10.70.37.156:/bricks/d1r1-add3
Brick14: 10.70.37.81:/bricks/d1r2-add3
Options Reconfigured:
features.quota-deem-statfs: on
features.quota: on
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

procedure folllowed for overcoming gfid mismatch is this,

1. find the gfid on the replica pair using getfattr command.

2. if the gfid is different for the file. then on brick of the pair 
   a. first delete the gfid entry from .glusterfs/indices/xattrop dir
   b. second delete the file itself from the same brick where step 2a is
executed

3. check the gfid again using getfattr. Found that the gfid now matches

after above the above steps I have done ls <filename> on the mount-point.

How reproducible:
seen this time

Actual results:

nfs.log,

signal received: 11
time of crash: 
2014-09-08 20:11:48
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.28
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x391c21ff06]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x391c23a59f]
/lib64/libc.so.6[0x3c6be326b0]
/usr/lib64/glusterfs/3.6.0.28/xlator/nfs/server.so(nfs_fop_lookup_cbk+0x225)[0x7fc08287f2e5]
/usr/lib64/glusterfs/3.6.0.28/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x113)[0x7fc082ad5e73]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lookup_everywhere_done+0x668)[0x7fc082f1cba8]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lookup_everywhere_cbk+0x403)[0x7fc082f215b3]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/replicate.so(afr_lookup_cbk+0x558)[0x7fc0831a4438]
/usr/lib64/glusterfs/3.6.0.28/xlator/protocol/client.so(client3_3_lookup_cbk+0x647)[0x7fc0833e3267]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x391ca0e9c5]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)[0x391ca0fe4f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x391ca0b668]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0x9275)[0x7fc08839e275]
/usr/lib64/glusterfs/3.6.0.28/rpc-transport/socket.so(+0xac5d)[0x7fc08839fc5d]
/usr/lib64/libglusterfs.so.0[0x391c276367]
/usr/sbin/glusterfs(main+0x603)[0x407e93]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3c6be1ed5d]
/usr/sbin/glusterfs[0x4049a9]

(gdb) bt
#0  nfs_fop_lookup_cbk (frame=0x7fc09085b740, cookie=0x1f38790, this=<value
optimized out>, op_ret=<value optimized out>, 
    op_errno=<value optimized out>, inode=<value optimized out>,
buf=0x7fc07fd7916c, xattr=0x2919ecc, postparent=0x7fc07fd7939c) at
nfs-fops.c:427
#1  0x00007fc082ad5e73 in io_stats_lookup_cbk (frame=0x7fc090a2635c,
cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=0, 
    inode=0x0, buf=0x7fc07fd7916c, xdata=0x2919ecc, postparent=0x7fc07fd7939c)
at io-stats.c:1505
#2  0x00007fc082f1cba8 in dht_lookup_everywhere_done (frame=0x7fc090a243c8,
this=<value optimized out>) at dht-common.c:1370
#3  0x00007fc082f215b3 in dht_lookup_everywhere_cbk (frame=0x7fc090a243c8,
cookie=<value optimized out>, this=0x1f34bc0, 
    op_ret=<value optimized out>, op_errno=<value optimized out>,
inode=0x7fc081be3b98, buf=0x7fc07f36ba58, xattr=0x2919ecc, 
    postparent=0x7fc07f36bac8) at dht-common.c:1577
#4  0x00007fc0831a4438 in afr_lookup_done (frame=0x7fbfffffffc6,
cookie=0x7fff5390a5f8, this=0x1f332e0, op_ret=<value optimized out>, 
    op_errno=43234112, inode=0x8, buf=0x7fff5390a850, xattr=0x2919ecc,
postparent=0x7fff5390a7e0) at afr-common.c:2319
#5  afr_lookup_cbk (frame=0x7fbfffffffc6, cookie=0x7fff5390a5f8,
this=0x1f332e0, op_ret=<value optimized out>, op_errno=43234112, inode=0x8, 
    buf=0x7fff5390a850, xattr=0x2919ecc, postparent=0x7fff5390a7e0) at
afr-common.c:2560
#6  0x00007fc0833e3267 in client3_3_lookup_cbk (req=<value optimized out>,
iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7fc090a37424) at client-rpc-fops.c:2767
#7  0x000000391ca0e9c5 in rpc_clnt_handle_reply (clnt=0x1feefd0,
pollin=0x29091b0) at rpc-clnt.c:763
#8  0x000000391ca0fe4f in rpc_clnt_notify (trans=<value optimized out>,
mydata=0x1fef000, event=<value optimized out>, data=<value optimized out>)
    at rpc-clnt.c:891
#9  0x000000391ca0b668 in rpc_transport_notify (this=<value optimized out>,
event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:512
#10 0x00007fc08839e275 in socket_event_poll_in (this=0x1ffebd0) at
socket.c:2127
#11 0x00007fc08839fc5d in socket_event_handler (fd=<value optimized out>,
idx=<value optimized out>, data=0x1ffebd0, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2240
#12 0x000000391c276367 in event_dispatch_epoll_handler (event_pool=0x1efb4d0)
at event-epoll.c:384
#13 event_dispatch_epoll (event_pool=0x1efb4d0) at event-epoll.c:445
#14 0x0000000000407e93 in main (argc=11, argv=0x7fff5390bce8) at
glusterfsd.c:2023

Expected results:
nfs should not have crashed.

--- Additional comment from Anand Avati on 2014-09-09 13:34:33 EDT ---

REVIEW: http://review.gluster.org/8666 (cluster/dht: Fixed double UNWIND in
lookup everywhere code) posted (#1) for review on master by Shyamsundar
Ranganathan (srangana at redhat.com)

--- Additional comment from Anand Avati on 2014-09-09 23:29:21 EDT ---

COMMIT: http://review.gluster.org/8666 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit b3314ea6e820fb659255d0e6e9a32ea259b7526d
Author: Shyam <srangana at redhat.com>
Date:   Tue Sep 9 13:31:51 2014 -0400

    cluster/dht: Fixed double UNWIND in lookup everywhere code

    In dht_lookup_everywhere_done:
    Line: 1194 we call DHT_STACK_UNWIND
    and in the same if condition we go ahead and call, goto
    unwind_hashed_and_cached; which at Line 1371 calls another UNWIND.

    As is obvious, higher frames could cleanup their locals and on
    receiving the next unwind could cause a coredump of the process.

    Fixed the same by calling the required return post the first unwind

    Change-Id: Ic5d57da98255b8616a65b4caaedabeba9144fd49
    BUG: 1139812
    Signed-off-by: Shyam <srangana at redhat.com>
    Reviewed-on: http://review.gluster.org/8666
    Reviewed-by: N Balachandran <nbalacha at redhat.com>
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>
    Reviewed-by: Jeff Darcy <jdarcy at redhat.com>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: susant palai <spalai at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1139812
[Bug 1139812] DHT: Rebalance process crash after add-brick and `rebalance
start' operation
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=rGPMiJ3B76&a=cc_unsubscribe