[Gluster-devel] crypt xlator bug

Emmanuel Dreyfus manu at netbsd.org
Wed Apr 1 09:57:04 UTC 2015


Hi

crypt.t was recently broken in NetBSD regression. The glusterfs returns
a node with file type invalid to FUSE, and that breaks the test.

After running a git bisect, I found the offending commit after which 
this behavior appeared:
    8a2e2b88fc21dc7879f838d18cd0413dd88023b7
    mem-pool: invalidate memory on GF_FREE to aid debugging

This means the bug has always been there, but this debugging aid 
caused it to be reliable.

With the help of an assertion, I can detect when inode->ia_type gets
a corrupted value. It gives me this backtrace where in frame 4, 
inode = 0xb9611880 and inode->ia_type = 12475 (which is wrong). 
inode value comes from FUSE state->loc->inode and we get it from
frame 20 which is in crypt.c:

#4  0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030, 
    inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471
#5  0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030, 
    loc=0xb9608840, xdata=0x0) at md-cache.c:847
#6  0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030, 
    loc=0xb9608840, xdata=0x0) at io-stats.c:1934
#7  0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030, 
    loc=0xb9608840, xdata=0x0) at defaults.c:2138
#8  0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030, 
    loc=0xb9608840, xdata=0x0) at meta.c:49
#9  0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607
#10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569
#11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644
#12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671
#13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635
#14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667
#15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687
#16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688, 
    cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0, 
    inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0, 
    postparent=0xb961e628) at fuse-resolve.c:81
#17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0, 
    cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0, 
    inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0, 
    postparent=0xb961e628) at io-stats.c:1512
#18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410,
    this=0xbb1df030, op_ret=0, op_errno=0, 
    inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0, 
     postparent=0xb961e628) at md-cache.c:816
#19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0,
    this=0xbb1de030, op_ret=0, op_errno=0, 
    inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0, 
    postparent=0xb961e628) at io-cache.c:260
#20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200,
    this=0xbb1db030, op_ret=0, op_errno=0, 
    dict=0xbb18a470, xdata=0x0) at crypt.c:3830

In frame 20:
    case GF_FOP_LOOKUP:
	    STACK_UNWIND_STRICT(lookup,
				frame,
				op_ret,
				op_errno,
				op_ret >= 0 ? local->inode : NULL,
				op_ret >= 0 ? &local->buf : NULL,
				local->xdata,
				op_ret >= 0 &local->postbuf : NULL);
 
Here is the problem, local->inode is not the 0xb9611880 value anymore,
which means local got corrupted:

(gdb) print local->inode
$2 = (inode_t *) 0x1db030de

I now suspect local has been freed, but I do not find where in crypt.c 
this operation is done. There is a local = mem_get0(this->local_pool) 
in crypt_alloc_local, but where is that structure freed? There is 
no mem_put() call in crypt xlator.


-- 
Emmanuel Dreyfus
manu at netbsd.org


More information about the Gluster-devel mailing list