[Bugs] [Bug 1402538] Assertion Failed Error messages in rebalance logs during rebalance

bugzilla at redhat.com bugzilla at redhat.com
Wed Dec 7 18:44:17 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1402538

Ashish Pandey <aspandey at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|bugs at gluster.org            |aspandey at redhat.com



--- Comment #1 from Ashish Pandey <aspandey at redhat.com> ---

While trying to reproduce this issue, I hit with the crash -

(gdb) bt
#0  0x00007f7503611765 in raise () from /lib64/libc.so.6
#1  0x00007f750361336a in abort () from /lib64/libc.so.6
#2  0x00007f7503609f97 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f750360a042 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f74f77bafd4 in ec_manager_setattr (fop=0x7f74ec0b491c, state=4) at
ec-inode-write.c:394
#5  0x00007f74f779c405 in __ec_manager (fop=0x7f74ec0b491c, error=0) at
ec-common.c:2283
#6  0x00007f74f7797d90 in ec_resume (fop=0x7f74ec0b491c, error=0) at
ec-common.c:289
#7  0x00007f74f7797fba in ec_complete (fop=0x7f74ec0b491c) at ec-common.c:362
#8  0x00007f74f77b979a in ec_inode_write_cbk (frame=0x7f74ec2d82bc,
this=0x7f74f002e770, cookie=0x3, op_ret=0, op_errno=0, prestat=0x7f74f848b980,
poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:65
#9  0x00007f74f77baacc in ec_setattr_cbk (frame=0x7f74ec2d82bc, cookie=0x3,
this=0x7f74f002e770, op_ret=0, op_errno=0, prestat=0x7f74f848b980,
poststat=0x7f74f848b910, xdata=0x7f74f01c6a3c) at ec-inode-write.c:349
#10 0x00007f74f7a5563d in client3_3_setattr_cbk (req=0x7f74ec0033ec,
iov=0x7f74ec00342c, count=1, myframe=0x7f74ec16fabc) at client-rpc-fops.c:2264
#11 0x00007f7504d8026a in rpc_clnt_handle_reply (clnt=0x7f74f005fe70,
pollin=0x7f74f0321c60) at rpc-clnt.c:790
#12 0x00007f7504d807c3 in rpc_clnt_notify (trans=0x7f74f0060300,
mydata=0x7f74f005fec8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60)
at rpc-clnt.c:970
#13 0x00007f7504d7ca37 in rpc_transport_notify (this=0x7f74f0060300,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f74f0321c60) at rpc-transport.c:537
#14 0x00007f74f9f53ca1 in socket_event_poll_in (this=0x7f74f0060300) at
socket.c:2265
#15 0x00007f74f9f541f2 in socket_event_handler (fd=29, idx=19,
data=0x7f74f0060300, poll_in=1, poll_out=0, poll_err=0) at socket.c:2395
#16 0x00007f7505024dbf in event_dispatch_epoll_handler (event_pool=0x1268010,
event=0x7f74f848bea0) at event-epoll.c:571
#17 0x00007f750502519e in event_dispatch_epoll_worker (data=0x12b1a60) at
event-epoll.c:674
#18 0x00007f7503e065ca in start_thread () from /lib64/libpthread.so.0
#19 0x00007f75036e00ed in clone () from /lib64/libc.so.6

------------------------------

It is crashing in ec_manager_setattr at -

 if (cbk->iatt[0].ia_type == IA_IFREG) {
                    ec_iatt_rebuild(fop->xl->private, cbk->iatt, 2,
                                    cbk->count);

                    /* This shouldn't fail because we have the inode locked. */
                    GF_ASSERT(ec_get_inode_size(fop,
                                                fop->locks[0].lock->loc.inode,
                                                &cbk->iatt[0].ia_size));
                    cbk->iatt[1].ia_size = cbk->iatt[0].ia_size;
                }

--------------------------------

Reason behind this is that, a file which is a link "IA_IFLNK" is coming as
regular file "IA_IFREG" in callback.

(gdb) p *cbk
$6 = {list = {next = 0x7f74ec0b495c, prev = 0x7f74ec0b495c}, answer_list =
{next = 0x7f74ec0b496c, prev = 0x7f74f04271ec}, fop = 0x7f74ec0b491c, next =
0x7f74f04271dc, idx = 3, op_ret = 0, op_errno = 0, count = 6, mask = 63, dirty
= {0, 
    0}, xdata = 0x7f74f01c6a3c, dict = 0x0, int32 = 0, uintptr = {0, 0, 0},
size = 0, version = {0, 0}, inode = 0x0, fd = 0x0, statvfs = {f_bsize = 0,
f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree =
0, 
    f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0,
0, 0, 0}}, iatt = {{ia_ino = 13705394905313093545, ia_gfid =
"\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type =
IA_IFREG, 
      ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner =
{read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000',
write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid
= 0, ia_rdev = 0, ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime =
1481113459, ia_atime_nsec = 489040926, ia_mtime = 1481113459, 
      ia_mtime_nsec = 489040926, ia_ctime = 1481113459, ia_ctime_nsec =
492040918}, {ia_ino = 13705394905313093545, ia_gfid =
"\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev = 64771, ia_type =
IA_IFREG, ia_prot = {
        suid = 0 '\000', sgid = 0 '\000', sticky = 1 '\001', owner = {read = 0
'\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0
'\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
          exec = 0 '\000'}}, ia_nlink = 1, ia_uid = 0, ia_gid = 0, ia_rdev = 0,
ia_size = 0, ia_blksize = 4096, ia_blocks = 0, ia_atime = 1481113459,
ia_atime_nsec = 489040926, ia_mtime = 1481113459, ia_mtime_nsec = 489040926, 
      ia_ctime = 1481113459, ia_ctime_nsec = 494040912}, {ia_ino = 0, ia_gfid =
'\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0
'\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write =
0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec =
0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, 
      ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime
= 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, ia_gfid
= '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {
        suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0
'\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0
'\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', 
          exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0,
ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0,
ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, {ia_ino = 0, 
      ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL,
ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read =
0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', 
          write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write =
0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0,
ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, 
      ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}}, flock
= {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len
= 0, data = '\000' <repeats 1023 times>}}, vector = 0x0, buffers = 0x0, 
  str = 0x0, entries = {{list = {next = 0x7f74f01b4884, prev = 0x7f74f01b4884},
{next = 0x7f74f01b4884, prev = 0x7f74f01b4884}}, d_ino = 0, d_off = 0, d_len =
0, d_type = 0, d_stat = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, 
      ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0
'\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0
'\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other =
{
          read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0,
ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks =
0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, 
      ia_ctime = 0, ia_ctime_nsec = 0}, dict = 0x0, inode = 0x0, d_name =
0x7f74f01b492c ""}, offset = 0, what = GF_SEEK_DATA}
(gdb)                 
(gdb) 
(gdb) p *cbk->fop
$7 = {id = 38, refs = 3, state = 4, minimum = 4, expected = 6, winds = 0, jobs
= 1, error = 0, parent = 0x0, xl = 0x7f74f002e770, req_frame = 0x7f74ec218d8c,
frame = 0x7f74ec2d82bc, cbk_list = {next = 0x7f74f01b40fc, 
    prev = 0x7f74f01b40fc}, answer_list = {next = 0x7f74ec12ef2c, prev =
0x7f74f01b410c}, pending_list = {next = 0x7f74e0017adc, prev = 0x7f74ec154f6c},
answer = 0x7f74f01b40fc, lock_count = 1, locked = 1, locks = {{
      lock = 0x7f74ec07b3bc, fop = 0x7f74ec0b491c, owner_list = {next =
0x7f74ec07b3cc, prev = 0x7f74ec07b3cc}, wait_list = {next = 0x7f74ec0b49bc,
prev = 0x7f74ec0b49bc}, update = {_gf_false, _gf_true}, base = 0x0, size = 0},
{
      lock = 0x0, fop = 0x0, owner_list = {next = 0x7f74ec0b49f4, prev =
0x7f74ec0b49f4}, wait_list = {next = 0x7f74ec0b4a04, prev = 0x7f74ec0b4a04},
update = {_gf_false, _gf_false}, base = 0x0, size = 0}}, first_lock = 0, lock =
{
    spinlock = 1, mutex = {__data = {__lock = 1, __count = 0, __owner = 0,
__nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = "\001", '\000' <repeats 38 times>, __align = 1}}, 
  flags = 0, first = 0, mask = 63, healing = 0, remaining = 0, received = 63,
good = 63, uid = 0, gid = 0, wind = 0x7f74f77baad2 <ec_wind_setattr>, handler =
0x7f74f77bae65 <ec_manager_setattr>, resume = 0x0, cbks = {
    access = 0x7f750505f400 <default_setattr_cbk>, create = 0x7f750505f400
<default_setattr_cbk>, discard = 0x7f750505f400 <default_setattr_cbk>, entrylk
= 0x7f750505f400 <default_setattr_cbk>, 
    fentrylk = 0x7f750505f400 <default_setattr_cbk>, fallocate = 0x7f750505f400
<default_setattr_cbk>, flush = 0x7f750505f400 <default_setattr_cbk>, fsync =
0x7f750505f400 <default_setattr_cbk>, 
    fsyncdir = 0x7f750505f400 <default_setattr_cbk>, getxattr = 0x7f750505f400
<default_setattr_cbk>, fgetxattr = 0x7f750505f400 <default_setattr_cbk>, heal =
0x7f750505f400 <default_setattr_cbk>, 
    fheal = 0x7f750505f400 <default_setattr_cbk>, inodelk = 0x7f750505f400
<default_setattr_cbk>, finodelk = 0x7f750505f400 <default_setattr_cbk>, link =
0x7f750505f400 <default_setattr_cbk>, lk = 0x7f750505f400
<default_setattr_cbk>, 
    lookup = 0x7f750505f400 <default_setattr_cbk>, mkdir = 0x7f750505f400
<default_setattr_cbk>, mknod = 0x7f750505f400 <default_setattr_cbk>, open =
0x7f750505f400 <default_setattr_cbk>, opendir = 0x7f750505f400
<default_setattr_cbk>, 
    readdir = 0x7f750505f400 <default_setattr_cbk>, readdirp = 0x7f750505f400
<default_setattr_cbk>, readlink = 0x7f750505f400 <default_setattr_cbk>, readv =
0x7f750505f400 <default_setattr_cbk>, 
    removexattr = 0x7f750505f400 <default_setattr_cbk>, fremovexattr =
0x7f750505f400 <default_setattr_cbk>, rename = 0x7f750505f400
<default_setattr_cbk>, rmdir = 0x7f750505f400 <default_setattr_cbk>, 
    setattr = 0x7f750505f400 <default_setattr_cbk>, fsetattr = 0x7f750505f400
<default_setattr_cbk>, setxattr = 0x7f750505f400 <default_setattr_cbk>,
fsetxattr = 0x7f750505f400 <default_setattr_cbk>, 
    stat = 0x7f750505f400 <default_setattr_cbk>, fstat = 0x7f750505f400
<default_setattr_cbk>, statfs = 0x7f750505f400 <default_setattr_cbk>, symlink =
0x7f750505f400 <default_setattr_cbk>, 
    truncate = 0x7f750505f400 <default_setattr_cbk>, ftruncate = 0x7f750505f400
<default_setattr_cbk>, unlink = 0x7f750505f400 <default_setattr_cbk>, writev =
0x7f750505f400 <default_setattr_cbk>, 
    xattrop = 0x7f750505f400 <default_setattr_cbk>, fxattrop = 0x7f750505f400
<default_setattr_cbk>, zerofill = 0x7f750505f400 <default_setattr_cbk>, seek =
0x7f750505f400 <default_setattr_cbk>, 
    ipc = 0x7f750505f400 <default_setattr_cbk>}, data = 0x0, heal = 0x0, healer
= {next = 0x7f74ec0b4ac4, prev = 0x7f74ec0b4ac4}, user_size = 0, head = 0,
use_fd = 0, xdata = 0x7f74ec1ca3fc, dict = 0x0, int32 = 6, uint32 = 0, size =
0, 
  offset = 0, mode = {0, 0}, entrylk_cmd = ENTRYLK_LOCK, entrylk_type =
ENTRYLK_RDLCK, xattrop_flags = GF_XATTROP_ADD_ARRAY, dev = 0, inode = 0x0, fd =
0x0, iatt = {ia_ino = 13705394905313093545, 
    ia_gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", ia_dev =
64771, ia_type = IA_IFLNK, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky
= 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, 
      group = {read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, other =
{read = 1 '\001', write = 1 '\001', exec = 1 '\001'}}, ia_nlink = 1, ia_uid =
0, ia_gid = 0, ia_rdev = 0, ia_size = 23, ia_blksize = 4096, ia_blocks = 4, 
    ia_atime = 1481113215, ia_atime_nsec = 659685788, ia_mtime = 1474704618,
ia_mtime_nsec = 0, ia_ctime = 1481111709, ia_ctime_nsec = 633516169}, str =
{0x0, 0x0}, loc = {{
      path = 0x7f74ec2f0b80
"/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts", name =
0x7f74ec2f0b9f "sun8i-a33-ippo-q8h-v1.2.dts", inode = 0x7f74cc02756c, parent =
0x7f74e40085dc, 
      gfid = "\200\032\325\370.\310D\256\276\063X\rU\242\247\251", pargfid =
"\224ϩ\346~UGZ\225\067\003\064\003J", <incomplete sequence \362>}, {path = 0x0,
name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, 
      pargfid = '\000' <repeats 15 times>}}, flock = {l_type = 0, l_whence = 0,
l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats
1023 times>}}, vector = 0x0, buffers = 0x0, seek = GF_SEEK_DATA}
(gdb) 


The RC for this is that, when we do setattr the link file was actually a link.
However, the moment it reached to backend, it has been migrated to some other
subvolume. At this point when posix_pstat get called, it is the "T" file on
which stat happens. This T file is a simply regular file.


root at apandey glusterfs]# 
[root at apandey glusterfs]# ll 
/brick/gluster/vol*/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40
/brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
-> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40
/brick/gluster/vol-11/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
-> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40
/brick/gluster/vol-12/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
-> sun8i-a33-q8-tablet.dts
---------T. 2 root root  0 Dec  7 17:54
/brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54
/brick/gluster/vol-14/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54
/brick/gluster/vol-15/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54
/brick/gluster/vol-16/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54
/brick/gluster/vol-17/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
---------T. 2 root root  0 Dec  7 17:54
/brick/gluster/vol-18/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40
/brick/gluster/vol-7/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
-> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40
/brick/gluster/vol-8/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
-> sun8i-a33-q8-tablet.dts
lrwxrwxrwx. 2 root root 23 Sep 24 13:40
/brick/gluster/vol-9/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
-> sun8i-a33-q8-tablet.dts
[root at apandey glusterfs]# 
[root at apandey glusterfs]# 
[root at apandey glusterfs]# 
[root at apandey glusterfs]# stat
/brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
  File:
'/brick/gluster/vol-13/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts'
  Size: 0             Blocks: 8          IO Block: 4096   regular empty file
Device: fd03h/64771d    Inode: 4233378     Links: 2
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:default_t:s0
Access: 2016-12-07 17:54:19.488040928 +0530
Modify: 2016-12-07 17:54:19.488040928 +0530
Change: 2016-12-07 17:56:05.634760199 +0530
 Birth: -
[root at apandey glusterfs]# stat
/brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts
  File:
'/brick/gluster/vol-10/linux-4.7.5/arch/arm/boot/dts/sun8i-a33-ippo-q8h-v1.2.dts'
-> 'sun8i-a33-q8-tablet.dts'
  Size: 23            Blocks: 8          IO Block: 4096   symbolic link
Device: fd03h/64771d    Inode: 4208551     Links: 2
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:default_t:s0
Access: 2016-12-07 17:56:09.681749496 +0530
Modify: 2016-09-24 13:40:18.000000000 +0530
Change: 2016-12-07 17:25:09.628516181 +0530


This is the log I placed in posix - 

[2016-12-07 12:24:19.494117] E [MSGID: 113018] [posix.c:506:posix_setattr]
0-vol-posix: Getting REG ofr LINK :
/brick/gluster/vol-18/.glusterfs/80/1a/801ad5f8-2ec8-44ae-be33-580d55a2a7a9
statpost.ia_type = 1statpre.ia_type = 1 LOC_IS_LNK(loc) = 0


------------------------

Now, when the call back reaches to EC, the condition 
 if (cbk->iatt[0].ia_type == IA_IFREG) {

becomes true and it checks for the size, which it will not have.
That leads to assertion failure.

I think we have to handle this case in EC. I talked to Prashant and he was
saying that had it been given to dht, it would have handled in second phase of
setattr.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list