[Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
Kotresh Hiremath Ravishankar
khiremat at redhat.com
Thu Mar 3 12:56:18 UTC 2016
Hi,
Yes, with this patch we need not set conn->trans to NULL in rpc_clnt_disable
Thanks and Regards,
Kotresh H R
----- Original Message -----
> From: "Soumya Koduri" <skoduri at redhat.com>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Raghavendra G" <raghavendra at gluster.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, March 3, 2016 5:06:00 PM
> Subject: Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
>
>
>
> On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
> > [Replying on top of my own reply]
> >
> > Hi,
> >
> > I have submitted the below patch [1] to avoid the issue of
> > 'rpc_clnt_submit'
> > getting reconnected. But it won't take care of memory leak problem you were
> > trying to fix. That we have to carefully go through all cases and fix it.
> > Please have a look at it.
> >
> Looks good. IIUC, with this patch, we need not set conn->trans to NULL
> in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak as
> the transport object shall then get freed as part of
> 'rpc_clnt_trigger_destroy'.
>
>
> > http://review.gluster.org/#/c/13592/
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > ----- Original Message -----
> >> From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> >> To: "Soumya Koduri" <skoduri at redhat.com>
> >> Cc: "Raghavendra G" <raghavendra at gluster.com>, "Gluster Devel"
> >> <gluster-devel at gluster.org>
> >> Sent: Thursday, March 3, 2016 3:39:11 PM
> >> Subject: Re: [Gluster-devel] Cores generated with
> >> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>
> >> Hi Soumya,
> >>
> >> I tested the lastes patch [2] on master where your previous patch [1] in
> >> merged.
> >> I see crashes at different places.
> >>
> >> 1. If there are code paths that are holding rpc object without taking ref
> >> on
> >> it, all those
> >> code path will crash on invoking rpc submit on that object as rpc
> >> object
> >> would have freed
> >> by last unref on DISCONNECT event. I see this kind of use-case in
> >> chagnelog rpc code.
> >> Need to check on other users of rpc.
> Agree. We should fix all such code-paths. Since this seem to be an
> intricate fix, shall we take these patches only in master branch and not
> in 3.7 release for now till we fix all such paths as we encounter?
>
> >>
> >> 2. And also we need to take care of reconnect timers that are being set
> >> and
> >> are re-tried to
> >> connect back on expiration. In those cases also, we might crash as rpc
> >> object would have freed.
> Your patch addresses this..right?
>
> Thanks,
> Soumya
>
> >>
> >>
> >> [1] http://review.gluster.org/#/c/13507/
> >> [2] http://review.gluster.org/#/c/13587/
> >>
> >> Thanks and Regards,
> >> Kotresh H R
> >>
> >> ----- Original Message -----
> >>> From: "Soumya Koduri" <skoduri at redhat.com>
> >>> To: "Raghavendra G" <raghavendra at gluster.com>, "Kotresh Hiremath
> >>> Ravishankar" <khiremat at redhat.com>
> >>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>> Sent: Thursday, March 3, 2016 12:24:00 PM
> >>> Subject: Re: [Gluster-devel] Cores generated with
> >>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>
> >>> Thanks a lot Kotresh.
> >>>
> >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> >>>> Hi Soumya,
> >>>>
> >>>> Can you send a fix to this regression on upstream master too? This patch
> >>>> is merged there.
> >>>>
> >>> I have submitted below patch.
> >>> http://review.gluster.org/#/c/13587/
> >>>
> >>> Kindly review the same.
> >>>
> >>> Thanks,
> >>> Soumya
> >>>
> >>>> regards,
> >>>> Raghavendra
> >>>>
> >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> >>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>> wrote:
> >>>>
> >>>> Hi Soumya,
> >>>>
> >>>> I analysed the issue and found out that crash has happened because
> >>>> of the patch [1].
> >>>>
> >>>> The patch doesn't set transport object to NULL in
> >>>> 'rpc_clnt_disable'
> >>>> but instead does it on
> >>>> 'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations
> >>>> on the rpc object that
> >>>> is disabled (those instances are possible as happening now in
> >>>> changelog), it will trigger a
> >>>> CONNECT notify again with 'mydata' that is freed causing a crash.
> >>>> This happens because
> >>>> 'rpc_clnt_submit' reconnects if rpc is not connected.
> >>>>
> >>>> rpc_clnt_submit (...) {
> >>>> ...
> >>>> if (conn->connected == 0) {
> >>>> ret = rpc_transport_connect (conn->trans,
> >>>>
> >>>> conn->config.remote_port);
> >>>> }
> >>>> ...
> >>>> }
> >>>>
> >>>> Without your patch, conn->trans was set NULL and hence CONNECT
> >>>> fails
> >>>> not resulting with
> >>>> CONNECT notify call. And also the cleanup happens in failure path.
> >>>>
> >>>> So the memory leak can happen, if there is no try for rpc
> >>>> invocation
> >>>> after DISCONNECT.
> >>>> It will be cleaned up otherwise.
> >>>>
> >>>>
> >>>> [1] http://review.gluster.org/#/c/13507/
> >>>>
> >>>> Thanks and Regards,
> >>>> Kotresh H R
> >>>>
> >>>> ----- Original Message -----
> >>>> > From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> >>>> <mailto:khiremat at redhat.com>>
> >>>> > To: "Soumya Koduri" <skoduri at redhat.com
> >>>> > <mailto:skoduri at redhat.com>>
> >>>> > Cc: avishwan at redhat.com <mailto:avishwan at redhat.com>, "Gluster
> >>>> Devel" <gluster-devel at gluster.org
> >>>> <mailto:gluster-devel at gluster.org>>
> >>>> > Sent: Monday, February 29, 2016 4:15:22 PM
> >>>> > Subject: Re: Cores generated with
> >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>> >
> >>>> > Hi Soumya,
> >>>> >
> >>>> > I just tested that it is reproducible only with your patch both
> >>>> in master and
> >>>> > 3.76 branch.
> >>>> > The geo-rep test cases are marked bad in master. So it's not hit
> >>>> in master.
> >>>> > rpc is introduced
> >>>> > in changelog xlator to communicate to applications via
> >>>> libgfchangelog.
> >>>> > Venky/Me will check
> >>>> > why is the crash happening and will update.
> >>>> >
> >>>> >
> >>>> > Thanks and Regards,
> >>>> > Kotresh H R
> >>>> >
> >>>> > ----- Original Message -----
> >>>> > > From: "Soumya Koduri" <skoduri at redhat.com
> >>>> <mailto:skoduri at redhat.com>>
> >>>> > > To: avishwan at redhat.com <mailto:avishwan at redhat.com>,
> >>>> > > "kotresh"
> >>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>>
> >>>> > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> >>>> <mailto:gluster-devel at gluster.org>>
> >>>> > > Sent: Monday, February 29, 2016 2:10:51 PM
> >>>> > > Subject: Cores generated with
> >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>> > >
> >>>> > > Hi Aravinda/Kotresh,
> >>>> > >
> >>>> > > With [1], I consistently see cores generated with the test
> >>>> > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7
> >>>> branch. From
> >>>> > > the cores, looks like we are trying to dereference a freed
> >>>> > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify().
> >>>> Strangely
> >>>> > > this was not reported in master branch.
> >>>> > >
> >>>> > > I tried debugging but couldn't find any possible suspects. I
> >>>> request you
> >>>> > > to take a look and let me know if [1] caused any regression.
> >>>> > >
> >>>> > > Thanks,
> >>>> > > Soumya
> >>>> > >
> >>>> > > [1] http://review.gluster.org/#/c/13507/
> >>>> > >
> >>>> >
> >>>> _______________________________________________
> >>>> Gluster-devel mailing list
> >>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> >>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Raghavendra G
> >>>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
>
More information about the Gluster-devel
mailing list