[Bugs] [Bug 1144315] New: core: all brick processes crash when quota is enabled

Fri Sep 19 08:39:36 UTC 2014

https://bugzilla.redhat.com/show_bug.cgi?id=1144315

            Bug ID: 1144315
           Summary: core: all brick processes crash when quota is enabled
           Product: GlusterFS
           Version: 3.5.3
         Component: quota
          Severity: urgent
          Priority: urgent
          Assignee: gluster-bugs at redhat.com
          Reporter: kdhananj at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1118591

+++ This bug was initially created as a clone of Bug #1118591 +++

Description of problem:
I just upgraded the glusterfs nodes and post upgrade mounted the volume to a
nfs client , executed iozone on the mount-point and iozone finished properly,
but after some time I am finding that the brick processes have crashed though I
enabled quota after iozone operation

with this backtrace,
pending frames:
frame : type(0) op(0)
frame : type(0) op(1)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(40)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-07-06 22:05:46
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.24
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f16eb4d1e56]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f16eb4ec28f]
/lib64/libc.so.6[0x3f4fa329a0]
/lib64/libc.so.6[0x3f4fa81461]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/marker.so(mq_loc_fill_from_name+0xa1)[0x7f16dbdf2651]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/marker.so(mq_readdir_cbk+0x2bf)[0x7f16dbdf628f]
/usr/lib64/libglusterfs.so.0(default_readdir_cbk+0xc2)[0x7f16eb4de0b2]
/usr/lib64/libglusterfs.so.0(default_readdir_cbk+0xc2)[0x7f16eb4de0b2]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/access-control.so(posix_acl_readdir_cbk+0xc2)[0x7f16e0a17432]
/usr/lib64/glusterfs/3.6.0.24/xlator/storage/posix.so(posix_do_readdir+0x1b8)[0x7f16e0e4f3c8]
/usr/lib64/glusterfs/3.6.0.24/xlator/storage/posix.so(posix_readdir+0x13)[0x7f16e0e4f603]
/usr/lib64/libglusterfs.so.0(default_readdir+0x83)[0x7f16eb4d7013]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/access-control.so(posix_acl_readdir+0x22d)[0x7f16e0a1991d]
/usr/lib64/libglusterfs.so.0(default_readdir+0x83)[0x7f16eb4d7013]
/usr/lib64/libglusterfs.so.0(default_readdir_resume+0x142)[0x7f16eb4d9a02]
/usr/lib64/libglusterfs.so.0(call_resume+0x1b1)[0x7f16eb4f3631]
/usr/lib64/glusterfs/3.6.0.24/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f16e05f6348]
/lib64/libpthread.so.0[0x3f502079d1]
/lib64/libc.so.6(clone+0x6d)[0x3f4fae8b5d]
---------

gluster volume info

[root at nfs1 ~]# gluster volume info dist-rep

Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 07f5f58d-83e3-4591-ba7f-e2473153e220
Status: Started
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.62:/bricks/d1r1
Brick2: 10.70.37.215:/bricks/d1r2
Brick3: 10.70.37.44:/bricks/d2r1
Brick4: 10.70.37.201:/bricks/dr2r2
Brick5: 10.70.37.62:/bricks/d3r1
Brick6: 10.70.37.215:/bricks/d3r2
Brick7: 10.70.37.44:/bricks/d4r1
Brick8: 10.70.37.201:/bricks/dr4r2
Brick9: 10.70.37.62:/bricks/d5r1
Brick10: 10.70.37.215:/bricks/d5r2
Brick11: 10.70.37.44:/bricks/d6r1
Brick12: 10.70.37.201:/bricks/dr6r2
Brick13: 10.70.37.62:/bricks/d1r1-add
Brick14: 10.70.37.215:/bricks/d1r2-add
Options Reconfigured:
nfs-ganesha.enable: off
nfs-ganesha.host: 10.70.37.44
nfs.disable: off
performance.readdir-ahead: on
features.quota: on
features.quota-deem-statfs: off
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

How reproducible:
crash seen once till now but for all bricks

Expected results:
crash is unexpected

Additional info:

--- Additional comment from Saurabh on 2014-07-07 05:22:27 EDT ---

(gdb) bt
#0  0x0000003f4fa81461 in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007f16dbdf2651 in mq_loc_fill_from_name (this=0xb8be10,
newloc=0x7f16bf9f89a0, oldloc=0xbad66c, ino=<value optimized out>,
name=0x7f169804d938 "appletalk")
    at marker-quota.c:176
#2  0x00007f16dbdf628f in mq_readdir_cbk (frame=0x7f16ea14bba8, cookie=<value
optimized out>, this=0xb8be10, op_ret=<value optimized out>, op_errno=<value
optimized out>, 
    entries=0x7f16bf9f8bb0, xdata=0x0) at marker-quota.c:609
#3  0x00007f16eb4de0b2 in default_readdir_cbk (frame=0x7f16ea3274e4,
cookie=<value optimized out>, this=<value optimized out>, op_ret=23,
op_errno=0, entries=<value optimized out>, 
    xdata=0x0) at defaults.c:1225
#4  0x00007f16eb4de0b2 in default_readdir_cbk (frame=0x7f16ea323c74,
cookie=<value optimized out>, this=<value optimized out>, op_ret=23,
op_errno=0, entries=<value optimized out>, 
    xdata=0x0) at defaults.c:1225
#5  0x00007f16e0a17432 in posix_acl_readdir_cbk (frame=0x7f16ea31d700,
cookie=<value optimized out>, this=<value optimized out>, op_ret=23,
op_errno=0, 
    entries=<value optimized out>, xdata=0x0) at posix-acl.c:1486
#6  0x00007f16e0e4f3c8 in posix_do_readdir (frame=0x7f16ea3276e8, this=<value
optimized out>, fd=<value optimized out>, size=<value optimized out>, off=23,
whichop=28, dict=0x0)
    at posix.c:4946
#7  0x00007f16e0e4f603 in posix_readdir (frame=<value optimized out>,
this=<value optimized out>, fd=<value optimized out>, size=<value optimized
out>, off=<value optimized out>, 
    xdata=<value optimized out>) at posix.c:4958
#8  0x00007f16eb4d7013 in default_readdir (frame=0x7f16ea3276e8, this=0xb83070,
fd=0xbcecb0, size=4096, off=<value optimized out>, xdata=<value optimized out>)
at defaults.c:2067
#9  0x00007f16e0a1991d in posix_acl_readdir (frame=0x7f16ea31d700,
this=0xb85ea0, fd=0xbcecb0, size=4096, offset=0, xdata=0x0) at posix-acl.c:1500
#10 0x00007f16eb4d7013 in default_readdir (frame=0x7f16ea31d700, this=0xb87130,
fd=0xbcecb0, size=4096, off=<value optimized out>, xdata=<value optimized out>)
at defaults.c:2067
#11 0x00007f16eb4d9a02 in default_readdir_resume (frame=0x7f16ea323c74,
this=0xb88350, fd=0xbcecb0, size=4096, off=0, xdata=0x0) at defaults.c:1635
#12 0x00007f16eb4f3631 in call_resume_wind (stub=0x7f16e9dc1f38) at
call-stub.c:2492
#13 call_resume (stub=0x7f16e9dc1f38) at call-stub.c:2841
#14 0x00007f16e05f6348 in iot_worker (data=0xbba080) at io-threads.c:214
#15 0x0000003f502079d1 in start_thread () from /lib64/libpthread.so.0
#16 0x0000003f4fae8b5d in clone () from /lib64/libc.so.6

further trace of bt,
(gdb) f 1
#1  0x00007f16dbdf2651 in mq_loc_fill_from_name (this=0xb8be10,
newloc=0x7f16bf9f89a0, oldloc=0xbad66c, ino=<value optimized out>,
name=0x7f169804d938 "appletalk")
    at marker-quota.c:176
176            len = strlen (oldloc->path);
(gdb) list
171            }
172    
173            newloc->parent = inode_ref (oldloc->inode);
174            uuid_copy (newloc->pargfid, oldloc->inode->gfid);
175    
176            len = strlen (oldloc->path);
177    
178            if (oldloc->path [len - 1] == '/')
179                    ret = gf_asprintf ((char **) &path, "%s%s",
180                                       oldloc->path, name);
(gdb) p oldloc
$1 = (loc_t *) 0xbad66c
(gdb) p *$
$2 = {path = 0x0, name = 0x0, inode = 0x7f16d91760b4, parent = 0x7f16d90f4be0,
gfid = "0\367H\216\361QF3\237\314\335\026\327\t\"p", 
  pargfid = "\037\062b<X\031Ej\232\035\000\346y\303\037\017"}
(gdb)

--- Additional comment from Niels de Vos on 2014-07-13 07:52:29 EDT ---

http://review.gluster.org/8296 has been POSTed, but against a bug in the Red
Hat Storage product. Please repost against this bug.

--- Additional comment from Anand Avati on 2014-07-14 02:38:05 EDT ---

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before
sending the control to healing) posted (#2) for review on master by Varun
Shastry (vshastry at redhat.com)

--- Additional comment from Anand Avati on 2014-07-14 06:05:26 EDT ---

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before
sending the control to healing) posted (#3) for review on master by Varun
Shastry (vshastry at redhat.com)

--- Additional comment from Anand Avati on 2014-07-15 02:45:25 EDT ---

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before
sending the control to healing) posted (#4) for review on master by Varun
Shastry (vshastry at redhat.com)

--- Additional comment from Anand Avati on 2014-07-21 07:26:14 EDT ---

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before
sending the control to healing) posted (#5) for review on master by Varun
Shastry (vshastry at redhat.com)

--- Additional comment from Anand Avati on 2014-07-22 11:56:59 EDT ---

COMMIT: http://review.gluster.org/8296 committed in master by Raghavendra G
(rgowdapp at redhat.com) 
------
commit 56ffb164743449897f1cdecd3dbe085a0f0a66d7
Author: Varun Shastry <vshastry at redhat.com>
Date:   Wed Jul 9 15:16:00 2014 +0530

    features/marker: Fill loc->path before sending the control to healing

    Problem:
    The xattr healing part of the marker requires path to be present in the
loc.
    Currently path is not filled while triggering from the readdirp_cbk.

    Solution:
    Current patch tries to fill the loc with path.

    Change-Id: I5c7dc9de60fa79ca0fe9b58d2636fd1355add0d3
    BUG: 1118591
    Signed-off-by: Varun Shastry <vshastry at redhat.com>
    Reviewed-on: http://review.gluster.org/8296
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>
    Tested-by: Raghavendra G <rgowdapp at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1118591
[Bug 1118591] core: all brick processes crash when quota is enabled
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=JmbxlaLWFX&a=cc_unsubscribe