[Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

Kotresh Hiremath Ravishankar khiremat at redhat.com
Thu Mar 3 10:09:11 UTC 2016


Hi Soumya,

I tested the lastes patch [2] on master where your previous patch [1] in merged.
I see crashes at different places.

1. If there are code paths that are holding rpc object without taking ref on it, all those
   code path will crash on invoking rpc submit on that object as rpc object would have freed
   by last unref on DISCONNECT event. I see this kind of use-case in chagnelog rpc code. 
   Need to check on other users of rpc.
   
2. And also we need to take care of reconnect timers that are being set and are re-tried to
   connect back on expiration. In those cases also, we might crash as rpc object would have freed.
   

[1] http://review.gluster.org/#/c/13507/
[2] http://review.gluster.org/#/c/13587/

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "Soumya Koduri" <skoduri at redhat.com>
> To: "Raghavendra G" <raghavendra at gluster.com>, "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, March 3, 2016 12:24:00 PM
> Subject: Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
> 
> Thanks a lot Kotresh.
> 
> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> > Hi Soumya,
> >
> > Can you send a fix to this regression on upstream master too? This patch
> > is merged there.
> >
> I have submitted below patch.
> 	http://review.gluster.org/#/c/13587/
> 
> Kindly review the same.
> 
> Thanks,
> Soumya
> 
> > regards,
> > Raghavendra
> >
> > On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> > <khiremat at redhat.com <mailto:khiremat at redhat.com>> wrote:
> >
> >     Hi Soumya,
> >
> >     I analysed the issue and found out that crash has happened because
> >     of the patch [1].
> >
> >     The patch doesn't set transport object to NULL in 'rpc_clnt_disable'
> >     but instead does it on
> >     'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations
> >     on the rpc object that
> >     is disabled (those instances are possible as happening now in
> >     changelog), it will trigger a
> >     CONNECT notify again with 'mydata' that is freed causing a crash.
> >     This happens because
> >     'rpc_clnt_submit' reconnects if rpc is not connected.
> >
> >       rpc_clnt_submit (...) {
> >         ...
> >                      if (conn->connected == 0) {
> >                              ret = rpc_transport_connect (conn->trans,
> >
> >       conn->config.remote_port);
> >                      }
> >         ...
> >       }
> >
> >     Without your patch, conn->trans was set NULL and hence CONNECT fails
> >     not resulting with
> >     CONNECT notify call. And also the cleanup happens in failure path.
> >
> >     So the memory leak can happen, if there is no try for rpc invocation
> >     after DISCONNECT.
> >     It will be cleaned up otherwise.
> >
> >
> >     [1] http://review.gluster.org/#/c/13507/
> >
> >     Thanks and Regards,
> >     Kotresh H R
> >
> >     ----- Original Message -----
> >      > From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> >     <mailto:khiremat at redhat.com>>
> >      > To: "Soumya Koduri" <skoduri at redhat.com <mailto:skoduri at redhat.com>>
> >      > Cc: avishwan at redhat.com <mailto:avishwan at redhat.com>, "Gluster
> >     Devel" <gluster-devel at gluster.org <mailto:gluster-devel at gluster.org>>
> >      > Sent: Monday, February 29, 2016 4:15:22 PM
> >      > Subject: Re: Cores generated with
> >     ./tests/geo-rep/georep-basic-dr-tarssh.t
> >      >
> >      > Hi Soumya,
> >      >
> >      > I just tested that it is reproducible only with your patch both
> >     in master and
> >      > 3.76 branch.
> >      > The geo-rep test cases are marked bad in master. So it's not hit
> >     in master.
> >      > rpc is introduced
> >      > in changelog xlator to communicate to applications via
> >     libgfchangelog.
> >      > Venky/Me will check
> >      > why is the crash happening and will update.
> >      >
> >      >
> >      > Thanks and Regards,
> >      > Kotresh H R
> >      >
> >      > ----- Original Message -----
> >      > > From: "Soumya Koduri" <skoduri at redhat.com
> >     <mailto:skoduri at redhat.com>>
> >      > > To: avishwan at redhat.com <mailto:avishwan at redhat.com>, "kotresh"
> >     <khiremat at redhat.com <mailto:khiremat at redhat.com>>
> >      > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> >     <mailto:gluster-devel at gluster.org>>
> >      > > Sent: Monday, February 29, 2016 2:10:51 PM
> >      > > Subject: Cores generated with
> >     ./tests/geo-rep/georep-basic-dr-tarssh.t
> >      > >
> >      > > Hi Aravinda/Kotresh,
> >      > >
> >      > > With [1], I consistently see cores generated with the test
> >      > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7
> >     branch. From
> >      > > the cores, looks like we are trying to dereference a freed
> >      > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify().
> >     Strangely
> >      > > this was not reported in master branch.
> >      > >
> >      > > I tried debugging but couldn't find any possible suspects. I
> >     request you
> >      > > to take a look and let me know if [1] caused any regression.
> >      > >
> >      > > Thanks,
> >      > > Soumya
> >      > >
> >      > > [1] http://review.gluster.org/#/c/13507/
> >      > >
> >      >
> >     _______________________________________________
> >     Gluster-devel mailing list
> >     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> >     http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> >
> >
> >
> > --
> > Raghavendra G
> 


More information about the Gluster-devel mailing list