[Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
Kotresh Hiremath Ravishankar
khiremat at redhat.com
Thu Mar 3 11:28:31 UTC 2016
[Replying on top of my own reply]
Hi,
I have submitted the below patch [1] to avoid the issue of 'rpc_clnt_submit'
getting reconnected. But it won't take care of memory leak problem you were
trying to fix. That we have to carefully go through all cases and fix it.
Please have a look at it.
http://review.gluster.org/#/c/13592/
Thanks and Regards,
Kotresh H R
----- Original Message -----
> From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> To: "Soumya Koduri" <skoduri at redhat.com>
> Cc: "Raghavendra G" <raghavendra at gluster.com>, "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, March 3, 2016 3:39:11 PM
> Subject: Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
>
> Hi Soumya,
>
> I tested the lastes patch [2] on master where your previous patch [1] in
> merged.
> I see crashes at different places.
>
> 1. If there are code paths that are holding rpc object without taking ref on
> it, all those
> code path will crash on invoking rpc submit on that object as rpc object
> would have freed
> by last unref on DISCONNECT event. I see this kind of use-case in
> chagnelog rpc code.
> Need to check on other users of rpc.
>
> 2. And also we need to take care of reconnect timers that are being set and
> are re-tried to
> connect back on expiration. In those cases also, we might crash as rpc
> object would have freed.
>
>
> [1] http://review.gluster.org/#/c/13507/
> [2] http://review.gluster.org/#/c/13587/
>
> Thanks and Regards,
> Kotresh H R
>
> ----- Original Message -----
> > From: "Soumya Koduri" <skoduri at redhat.com>
> > To: "Raghavendra G" <raghavendra at gluster.com>, "Kotresh Hiremath
> > Ravishankar" <khiremat at redhat.com>
> > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > Sent: Thursday, March 3, 2016 12:24:00 PM
> > Subject: Re: [Gluster-devel] Cores generated with
> > ./tests/geo-rep/georep-basic-dr-tarssh.t
> >
> > Thanks a lot Kotresh.
> >
> > On 03/03/2016 08:47 AM, Raghavendra G wrote:
> > > Hi Soumya,
> > >
> > > Can you send a fix to this regression on upstream master too? This patch
> > > is merged there.
> > >
> > I have submitted below patch.
> > http://review.gluster.org/#/c/13587/
> >
> > Kindly review the same.
> >
> > Thanks,
> > Soumya
> >
> > > regards,
> > > Raghavendra
> > >
> > > On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> > > <khiremat at redhat.com <mailto:khiremat at redhat.com>> wrote:
> > >
> > > Hi Soumya,
> > >
> > > I analysed the issue and found out that crash has happened because
> > > of the patch [1].
> > >
> > > The patch doesn't set transport object to NULL in 'rpc_clnt_disable'
> > > but instead does it on
> > > 'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations
> > > on the rpc object that
> > > is disabled (those instances are possible as happening now in
> > > changelog), it will trigger a
> > > CONNECT notify again with 'mydata' that is freed causing a crash.
> > > This happens because
> > > 'rpc_clnt_submit' reconnects if rpc is not connected.
> > >
> > > rpc_clnt_submit (...) {
> > > ...
> > > if (conn->connected == 0) {
> > > ret = rpc_transport_connect (conn->trans,
> > >
> > > conn->config.remote_port);
> > > }
> > > ...
> > > }
> > >
> > > Without your patch, conn->trans was set NULL and hence CONNECT fails
> > > not resulting with
> > > CONNECT notify call. And also the cleanup happens in failure path.
> > >
> > > So the memory leak can happen, if there is no try for rpc invocation
> > > after DISCONNECT.
> > > It will be cleaned up otherwise.
> > >
> > >
> > > [1] http://review.gluster.org/#/c/13507/
> > >
> > > Thanks and Regards,
> > > Kotresh H R
> > >
> > > ----- Original Message -----
> > > > From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> > > <mailto:khiremat at redhat.com>>
> > > > To: "Soumya Koduri" <skoduri at redhat.com
> > > > <mailto:skoduri at redhat.com>>
> > > > Cc: avishwan at redhat.com <mailto:avishwan at redhat.com>, "Gluster
> > > Devel" <gluster-devel at gluster.org <mailto:gluster-devel at gluster.org>>
> > > > Sent: Monday, February 29, 2016 4:15:22 PM
> > > > Subject: Re: Cores generated with
> > > ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >
> > > > Hi Soumya,
> > > >
> > > > I just tested that it is reproducible only with your patch both
> > > in master and
> > > > 3.76 branch.
> > > > The geo-rep test cases are marked bad in master. So it's not hit
> > > in master.
> > > > rpc is introduced
> > > > in changelog xlator to communicate to applications via
> > > libgfchangelog.
> > > > Venky/Me will check
> > > > why is the crash happening and will update.
> > > >
> > > >
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ----- Original Message -----
> > > > > From: "Soumya Koduri" <skoduri at redhat.com
> > > <mailto:skoduri at redhat.com>>
> > > > > To: avishwan at redhat.com <mailto:avishwan at redhat.com>, "kotresh"
> > > <khiremat at redhat.com <mailto:khiremat at redhat.com>>
> > > > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> > > <mailto:gluster-devel at gluster.org>>
> > > > > Sent: Monday, February 29, 2016 2:10:51 PM
> > > > > Subject: Cores generated with
> > > ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > > >
> > > > > Hi Aravinda/Kotresh,
> > > > >
> > > > > With [1], I consistently see cores generated with the test
> > > > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7
> > > branch. From
> > > > > the cores, looks like we are trying to dereference a freed
> > > > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify().
> > > Strangely
> > > > > this was not reported in master branch.
> > > > >
> > > > > I tried debugging but couldn't find any possible suspects. I
> > > request you
> > > > > to take a look and let me know if [1] caused any regression.
> > > > >
> > > > > Thanks,
> > > > > Soumya
> > > > >
> > > > > [1] http://review.gluster.org/#/c/13507/
> > > > >
> > > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > >
> > >
> > >
> > > --
> > > Raghavendra G
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list