[Gluster-devel] spurious regression errors getting worse

Niels de Vos ndevos at redhat.com
Thu Nov 12 09:49:37 UTC 2015


On Wed, Nov 11, 2015 at 06:24:39PM -0500, Dan Lambright wrote:
> 
> 
> ----- Original Message -----
> > From: "Dan Lambright" <dlambrig at redhat.com>
> > To: "Niels de Vos" <ndevos at redhat.com>
> > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > Sent: Monday, November 9, 2015 4:18:31 PM
> > Subject: Re: [Gluster-devel] spurious regression errors getting worse
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Niels de Vos" <ndevos at redhat.com>
> > > To: "Dan Lambright" <dlambrig at redhat.com>
> > > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > > Sent: Monday, November 9, 2015 4:09:08 PM
> > > Subject: Re: [Gluster-devel] spurious regression errors getting worse
> > > 
> > > On Thu, Nov 05, 2015 at 09:17:28PM -0500, Dan Lambright wrote:
> > > > It seems to have become more difficult in the last week to pass
> > > > regression
> > > > tests.
> > > > 
> > > > I've started recording the tests that seem to be failing the most:
> > > > 
> > > > bug-1221481-allow-fops-on-dir-split-brain.t
> > > > bug-1238706-daemons-stop-on-peer-cleanup.t
> > > > ./tests/bugs/quota/bug-1235182.t
> > > > ./tests/bugs/distribute/bug-1066798.t
> > > > ./tests/bugs/snapshot/bug-1166197.t
> > > > 
> > > > In some cases regression must be run a half dozen times before finally
> > > > passing.
> > > > 
> > > > Could the owners those tests please look into these?
> > > 
> > > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/11617/consoleFull
> > > failed on
> > > 
> > >     [16:18:49] ./tests/basic/tier/fops-during-migration-pause.t ..
> > >     not ok 19
> > >     not ok 20
> > >     Failed 2/20 subtests
> > >     [16:18:49]
> > > 
> > > Please have a look. Thanks,
> 
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/15785/consoleFull
> 
> failed on 
> 
> ./tests/bugs/fuse/many-groups-for-acl.t: 1 new core files
> 
> [root at rhs-cli-11 glusterfs]# git blame ./tests/bugs/fuse/many-groups-for-acl.t|grep Niels|wc -l
> 113

Heh, yes, I'm pretty sure I wrote that test! But I can not remember that
it ever failed :-/

  (gdb) f 0
  #0  0x00007fba6cd76432 in client_submit_request (this=0x7fba68006fc0,
      req=0x7fba6579aa70, frame=0x7fba5c0058cc, prog=0x7fba6cfb53c0
      <clnt3_3_fop_prog>, procnum=41, cbkfn=0x7fba6cd9206d
      <client3_3_release_cbk>, iobref=0x0, rsphdr=0x0, rsphdr_count=0,
      rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0,
      xdrproc=0x7fba79801075 <xdr_gfs3_release_req>) at
      /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client.c:324
  324	                frame->root->ngrps = ngroups;
  (gdb) l
  319	                gf_msg_debug (this->name, 0, "rpc_clnt_submit failed");
  320	        }
  321	
  322	        if (!conf->send_gids) {
  323	                /* restore previous values */
  324	                frame->root->ngrps = ngroups;
  325	                if (ngroups <= SMALL_GROUP_COUNT)
  326	                        frame->root->groups_small[0] = gid;
  327	        }
  328	
  (gdb) p *frame->root
  Cannot access memory at address 0x64185df000000000


After looking at this in more detail, the flow is like this:

  client_submit_request()
    |
    '- rpc_clnt_submit() // on line 314
         |
         '- cbkfn() // = client3_3_release_cbk
              |
              :- STACK_DESTROY (frame->root);
         .----'
    .----'
    |
    :- frame->root->ngrps = ngroups; // on line 324
    '

So, this is a real use-after-free! Yay for regression tests :-)

Bug: https://bugzilla.redhat.com/1281285
Patch: http://review.gluster.org/12575

Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151112/8c45dfce/attachment.sig>


More information about the Gluster-devel mailing list