[Gluster-devel] crypt xlator bug

Pranith Kumar Karampuri pkarampu at redhat.com
Fri Apr 3 06:56:09 UTC 2015


On 04/01/2015 03:27 PM, Emmanuel Dreyfus wrote:
> Hi
>
> crypt.t was recently broken in NetBSD regression. The glusterfs returns
> a node with file type invalid to FUSE, and that breaks the test.
>
> After running a git bisect, I found the offending commit after which
> this behavior appeared:
>      8a2e2b88fc21dc7879f838d18cd0413dd88023b7
>      mem-pool: invalidate memory on GF_FREE to aid debugging
>
> This means the bug has always been there, but this debugging aid
> caused it to be reliable.
>
> With the help of an assertion, I can detect when inode->ia_type gets
> a corrupted value. It gives me this backtrace where in frame 4,
> inode = 0xb9611880 and inode->ia_type = 12475 (which is wrong).
> inode value comes from FUSE state->loc->inode and we get it from
> frame 20 which is in crypt.c:
>
> #4  0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030,
>      inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471
> #5  0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030,
>      loc=0xb9608840, xdata=0x0) at md-cache.c:847
> #6  0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030,
>      loc=0xb9608840, xdata=0x0) at io-stats.c:1934
> #7  0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030,
>      loc=0xb9608840, xdata=0x0) at defaults.c:2138
> #8  0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030,
>      loc=0xb9608840, xdata=0x0) at meta.c:49
> #9  0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607
> #10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569
> #11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644
> #12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671
> #13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635
> #14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667
> #15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687
> #16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688,
>      cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0,
>      inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0,
>      postparent=0xb961e628) at fuse-resolve.c:81
> #17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0,
>      cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0,
>      inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0,
>      postparent=0xb961e628) at io-stats.c:1512
> #18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410,
>      this=0xbb1df030, op_ret=0, op_errno=0,
>      inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0,
>       postparent=0xb961e628) at md-cache.c:816
> #19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0,
>      this=0xbb1de030, op_ret=0, op_errno=0,
>      inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0,
>      postparent=0xb961e628) at io-cache.c:260
> #20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200,
>      this=0xbb1db030, op_ret=0, op_errno=0,
>      dict=0xbb18a470, xdata=0x0) at crypt.c:3830
>
> In frame 20:
>      case GF_FOP_LOOKUP:
> 	    STACK_UNWIND_STRICT(lookup,
> 				frame,
> 				op_ret,
> 				op_errno,
> 				op_ret >= 0 ? local->inode : NULL,
> 				op_ret >= 0 ? &local->buf : NULL,
> 				local->xdata,
> 				op_ret >= 0 &local->postbuf : NULL);
>   
> Here is the problem, local->inode is not the 0xb9611880 value anymore,
> which means local got corrupted:
>
> (gdb) print local->inode
> $2 = (inode_t *) 0x1db030de
>
> I now suspect local has been freed, but I do not find where in crypt.c
> this operation is done. There is a local = mem_get0(this->local_pool)
> in crypt_alloc_local, but where is that structure freed? There is
> no mem_put() call in crypt xlator.
I joined this thread after seeing raghavendra talur's patch which fixed 
the issue, which seemed extremely odd to me. Just checked this mail from 
you and
local->inode in crypt need not be same as state->loc->inode because, 
inode_link in fuse_resolve_entry_cbk will give address of already linked 
inode with same gfid if one exists. I see hardlink related commands in 
crypt.t so this could be part of looking up extra link may be? which is 
resolving to older inode that is already linked. It is still some memory 
problem, but may not be anything to do with crypt. Could you let me know 
the details of the setup where you saw this issue? I can take a look.

Pranith
>
>



More information about the Gluster-devel mailing list