[Gluster-devel] spurious regression errors getting worse
Niels de Vos
ndevos at redhat.com
Thu Nov 12 09:49:37 UTC 2015
On Wed, Nov 11, 2015 at 06:24:39PM -0500, Dan Lambright wrote:
>
>
> ----- Original Message -----
> > From: "Dan Lambright" <dlambrig at redhat.com>
> > To: "Niels de Vos" <ndevos at redhat.com>
> > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > Sent: Monday, November 9, 2015 4:18:31 PM
> > Subject: Re: [Gluster-devel] spurious regression errors getting worse
> >
> >
> >
> > ----- Original Message -----
> > > From: "Niels de Vos" <ndevos at redhat.com>
> > > To: "Dan Lambright" <dlambrig at redhat.com>
> > > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > > Sent: Monday, November 9, 2015 4:09:08 PM
> > > Subject: Re: [Gluster-devel] spurious regression errors getting worse
> > >
> > > On Thu, Nov 05, 2015 at 09:17:28PM -0500, Dan Lambright wrote:
> > > > It seems to have become more difficult in the last week to pass
> > > > regression
> > > > tests.
> > > >
> > > > I've started recording the tests that seem to be failing the most:
> > > >
> > > > bug-1221481-allow-fops-on-dir-split-brain.t
> > > > bug-1238706-daemons-stop-on-peer-cleanup.t
> > > > ./tests/bugs/quota/bug-1235182.t
> > > > ./tests/bugs/distribute/bug-1066798.t
> > > > ./tests/bugs/snapshot/bug-1166197.t
> > > >
> > > > In some cases regression must be run a half dozen times before finally
> > > > passing.
> > > >
> > > > Could the owners those tests please look into these?
> > >
> > > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/11617/consoleFull
> > > failed on
> > >
> > > [16:18:49] ./tests/basic/tier/fops-during-migration-pause.t ..
> > > not ok 19
> > > not ok 20
> > > Failed 2/20 subtests
> > > [16:18:49]
> > >
> > > Please have a look. Thanks,
>
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/15785/consoleFull
>
> failed on
>
> ./tests/bugs/fuse/many-groups-for-acl.t: 1 new core files
>
> [root at rhs-cli-11 glusterfs]# git blame ./tests/bugs/fuse/many-groups-for-acl.t|grep Niels|wc -l
> 113
Heh, yes, I'm pretty sure I wrote that test! But I can not remember that
it ever failed :-/
(gdb) f 0
#0 0x00007fba6cd76432 in client_submit_request (this=0x7fba68006fc0,
req=0x7fba6579aa70, frame=0x7fba5c0058cc, prog=0x7fba6cfb53c0
<clnt3_3_fop_prog>, procnum=41, cbkfn=0x7fba6cd9206d
<client3_3_release_cbk>, iobref=0x0, rsphdr=0x0, rsphdr_count=0,
rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0,
xdrproc=0x7fba79801075 <xdr_gfs3_release_req>) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client.c:324
324 frame->root->ngrps = ngroups;
(gdb) l
319 gf_msg_debug (this->name, 0, "rpc_clnt_submit failed");
320 }
321
322 if (!conf->send_gids) {
323 /* restore previous values */
324 frame->root->ngrps = ngroups;
325 if (ngroups <= SMALL_GROUP_COUNT)
326 frame->root->groups_small[0] = gid;
327 }
328
(gdb) p *frame->root
Cannot access memory at address 0x64185df000000000
After looking at this in more detail, the flow is like this:
client_submit_request()
|
'- rpc_clnt_submit() // on line 314
|
'- cbkfn() // = client3_3_release_cbk
|
:- STACK_DESTROY (frame->root);
.----'
.----'
|
:- frame->root->ngrps = ngroups; // on line 324
'
So, this is a real use-after-free! Yay for regression tests :-)
Bug: https://bugzilla.redhat.com/1281285
Patch: http://review.gluster.org/12575
Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151112/8c45dfce/attachment.sig>
More information about the Gluster-devel
mailing list