[Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
Raghavendra Gowdappa
rgowdapp at redhat.com
Tue Jun 28 09:58:15 UTC 2016
Thanks for the reminder. Will take that up.
----- Original Message -----
> From: "Soumya Koduri" <skoduri at redhat.com>
> To: "Venky Shankar" <vshankar at redhat.com>, "Raghavendra G" <raghavendra at gluster.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Tuesday, June 28, 2016 3:24:13 PM
> Subject: Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
>
> Hi Raghavendra/Venky,
>
> Gentle reminder. Oleksander (post-factum) confirmed that their
> production setup has been running with this patch included since
> sometime and had no issues. Shall we consider merging this patch if
> there are no review comments?
>
> Thanks,
> Soumya
>
> On 03/09/2016 09:16 PM, Kotresh Hiremath Ravishankar wrote:
> > Hi All,
> >
> > The following patch is sent to address changelog rpc mem-leak issues.
> > The fix is intricate and needs lot of testing before taking in. Please
> > review the same.
> >
> > http://review.gluster.org/#/c/13658/1
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> >
> > ----- Original Message -----
> >> From: "Venky Shankar" <vshankar at redhat.com>
> >> To: "Raghavendra G" <raghavendra at gluster.com>
> >> Cc: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Gluster Devel"
> >> <gluster-devel at gluster.org>
> >> Sent: Monday, March 7, 2016 3:52:13 PM
> >> Subject: Re: [Gluster-devel] Cores generated with
> >> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>
> >> On Fri, Mar 04, 2016 at 02:02:32PM +0530, Raghavendra G wrote:
> >>> On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar <
> >>> khiremat at redhat.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Yes, with this patch we need not set conn->trans to NULL in
> >>>> rpc_clnt_disable
> >>>>
> >>>
> >>> While [1] fixes the crash, things can be improved in the way how
> >>> changelog
> >>> is using rpc.
> >>>
> >>> 1. In the current code, there is an rpc_clnt object leak during
> >>> disconnect
> >>> event.
> >>
> >> Just for the record (as I couldn't find this information while building up
> >> rpc infrastructure in changelog):
> >>
> >> Unref rpc-clnt object after calling rpc_clnt_disable() in notify() upon
> >> RPC_CLNT_DISCONNECT. Free up 'mydata' in notify() upon RPC_CLNT_DESTROY.
> >>
> >>> 2. Also, freed "mydata" of changelog is still associated with rpc_clnt
> >>> object (corollary of 1), though change log might not get any events with
> >>> "mydata" (as connection is dead).
> >>>
> >>> I've discussed with Kotresh about changes needed, offline. So, following
> >>> are the action items.
> >>> 1. Soumya's patch [2] is valid and is needed for 3.7 branch too.
> >>> 2. [2] can be accepted. However, someone might want to re-use an rpc
> >>> object
> >>> after disabling it, like introducing a new api rpc_clnt_enable_again
> >>> (though no of such use-cases is very less). But [2] doesn't allow it. The
> >>> point is as long as rpc-clnt object is alive, transport object is alive
> >>> (though disconnected) and we can re-use it. So, I would prefer not to
> >>> accept it.
> >>> 3. Kotresh will work on new changes to make sure changelog makes correct
> >>> use of rpc-clnt.
> >>>
> >>> [1] http://review.gluster.org/#/c/13592
> >>> [2] http://review.gluster.org/#/c/1359
> >>>
> >>> regards,
> >>> Raghavendra.
> >>>
> >>>
> >>>> Thanks and Regards,
> >>>> Kotresh H R
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Soumya Koduri" <skoduri at redhat.com>
> >>>>> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Raghavendra
> >>>> G" <raghavendra at gluster.com>
> >>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>> Sent: Thursday, March 3, 2016 5:06:00 PM
> >>>>> Subject: Re: [Gluster-devel] Cores generated with
> >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
> >>>>>> [Replying on top of my own reply]
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have submitted the below patch [1] to avoid the issue of
> >>>>>> 'rpc_clnt_submit'
> >>>>>> getting reconnected. But it won't take care of memory leak problem
> >>>>>> you
> >>>> were
> >>>>>> trying to fix. That we have to carefully go through all cases and fix
> >>>> it.
> >>>>>> Please have a look at it.
> >>>>>>
> >>>>> Looks good. IIUC, with this patch, we need not set conn->trans to NULL
> >>>>> in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak
> >>>>> as
> >>>>> the transport object shall then get freed as part of
> >>>>> 'rpc_clnt_trigger_destroy'.
> >>>>>
> >>>>>
> >>>>>> http://review.gluster.org/#/c/13592/
> >>>>>>
> >>>>>> Thanks and Regards,
> >>>>>> Kotresh H R
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>>> From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> >>>>>>> To: "Soumya Koduri" <skoduri at redhat.com>
> >>>>>>> Cc: "Raghavendra G" <raghavendra at gluster.com>, "Gluster Devel"
> >>>>>>> <gluster-devel at gluster.org>
> >>>>>>> Sent: Thursday, March 3, 2016 3:39:11 PM
> >>>>>>> Subject: Re: [Gluster-devel] Cores generated with
> >>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>>>>>
> >>>>>>> Hi Soumya,
> >>>>>>>
> >>>>>>> I tested the lastes patch [2] on master where your previous patch
> >>>>>>> [1]
> >>>> in
> >>>>>>> merged.
> >>>>>>> I see crashes at different places.
> >>>>>>>
> >>>>>>> 1. If there are code paths that are holding rpc object without
> >>>>>>> taking
> >>>> ref
> >>>>>>> on
> >>>>>>> it, all those
> >>>>>>> code path will crash on invoking rpc submit on that object as
> >>>>>>> rpc
> >>>>>>> object
> >>>>>>> would have freed
> >>>>>>> by last unref on DISCONNECT event. I see this kind of use-case
> >>>>>>> in
> >>>>>>> chagnelog rpc code.
> >>>>>>> Need to check on other users of rpc.
> >>>>> Agree. We should fix all such code-paths. Since this seem to be an
> >>>>> intricate fix, shall we take these patches only in master branch and
> >>>>> not
> >>>>> in 3.7 release for now till we fix all such paths as we encounter?
> >>>>>
> >>>>>>>
> >>>>>>> 2. And also we need to take care of reconnect timers that are being
> >>>> set
> >>>>>>> and
> >>>>>>> are re-tried to
> >>>>>>> connect back on expiration. In those cases also, we might crash
> >>>> as rpc
> >>>>>>> object would have freed.
> >>>>> Your patch addresses this..right?
> >>>>>
> >>>>> Thanks,
> >>>>> Soumya
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] http://review.gluster.org/#/c/13507/
> >>>>>>> [2] http://review.gluster.org/#/c/13587/
> >>>>>>>
> >>>>>>> Thanks and Regards,
> >>>>>>> Kotresh H R
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: "Soumya Koduri" <skoduri at redhat.com>
> >>>>>>>> To: "Raghavendra G" <raghavendra at gluster.com>, "Kotresh Hiremath
> >>>>>>>> Ravishankar" <khiremat at redhat.com>
> >>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>>>>> Sent: Thursday, March 3, 2016 12:24:00 PM
> >>>>>>>> Subject: Re: [Gluster-devel] Cores generated with
> >>>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>>>>>>
> >>>>>>>> Thanks a lot Kotresh.
> >>>>>>>>
> >>>>>>>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> >>>>>>>>> Hi Soumya,
> >>>>>>>>>
> >>>>>>>>> Can you send a fix to this regression on upstream master too? This
> >>>> patch
> >>>>>>>>> is merged there.
> >>>>>>>>>
> >>>>>>>> I have submitted below patch.
> >>>>>>>> http://review.gluster.org/#/c/13587/
> >>>>>>>>
> >>>>>>>> Kindly review the same.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Soumya
> >>>>>>>>
> >>>>>>>>> regards,
> >>>>>>>>> Raghavendra
> >>>>>>>>>
> >>>>>>>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> >>>>>>>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Soumya,
> >>>>>>>>>
> >>>>>>>>> I analysed the issue and found out that crash has happened
> >>>> because
> >>>>>>>>> of the patch [1].
> >>>>>>>>>
> >>>>>>>>> The patch doesn't set transport object to NULL in
> >>>>>>>>> 'rpc_clnt_disable'
> >>>>>>>>> but instead does it on
> >>>>>>>>> 'rpc_clnt_trigger_destroy'. So if there are pending rpc
> >>>> invocations
> >>>>>>>>> on the rpc object that
> >>>>>>>>> is disabled (those instances are possible as happening now in
> >>>>>>>>> changelog), it will trigger a
> >>>>>>>>> CONNECT notify again with 'mydata' that is freed causing a
> >>>> crash.
> >>>>>>>>> This happens because
> >>>>>>>>> 'rpc_clnt_submit' reconnects if rpc is not connected.
> >>>>>>>>>
> >>>>>>>>> rpc_clnt_submit (...) {
> >>>>>>>>> ...
> >>>>>>>>> if (conn->connected == 0) {
> >>>>>>>>> ret = rpc_transport_connect
> >>>> (conn->trans,
> >>>>>>>>>
> >>>>>>>>> conn->config.remote_port);
> >>>>>>>>> }
> >>>>>>>>> ...
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> Without your patch, conn->trans was set NULL and hence
> >>>>>>>>> CONNECT
> >>>>>>>>> fails
> >>>>>>>>> not resulting with
> >>>>>>>>> CONNECT notify call. And also the cleanup happens in failure
> >>>> path.
> >>>>>>>>>
> >>>>>>>>> So the memory leak can happen, if there is no try for rpc
> >>>>>>>>> invocation
> >>>>>>>>> after DISCONNECT.
> >>>>>>>>> It will be cleaned up otherwise.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [1] http://review.gluster.org/#/c/13507/
> >>>>>>>>>
> >>>>>>>>> Thanks and Regards,
> >>>>>>>>> Kotresh H R
> >>>>>>>>>
> >>>>>>>>> ----- Original Message -----
> >>>>>>>>> > From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> >>>>>>>>> <mailto:khiremat at redhat.com>>
> >>>>>>>>> > To: "Soumya Koduri" <skoduri at redhat.com
> >>>>>>>>> > <mailto:skoduri at redhat.com>>
> >>>>>>>>> > Cc: avishwan at redhat.com <mailto:avishwan at redhat.com>,
> >>>> "Gluster
> >>>>>>>>> Devel" <gluster-devel at gluster.org
> >>>>>>>>> <mailto:gluster-devel at gluster.org>>
> >>>>>>>>> > Sent: Monday, February 29, 2016 4:15:22 PM
> >>>>>>>>> > Subject: Re: Cores generated with
> >>>>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>>>>>>> >
> >>>>>>>>> > Hi Soumya,
> >>>>>>>>> >
> >>>>>>>>> > I just tested that it is reproducible only with your patch
> >>>> both
> >>>>>>>>> in master and
> >>>>>>>>> > 3.76 branch.
> >>>>>>>>> > The geo-rep test cases are marked bad in master. So it's
> >>>> not hit
> >>>>>>>>> in master.
> >>>>>>>>> > rpc is introduced
> >>>>>>>>> > in changelog xlator to communicate to applications via
> >>>>>>>>> libgfchangelog.
> >>>>>>>>> > Venky/Me will check
> >>>>>>>>> > why is the crash happening and will update.
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> > Thanks and Regards,
> >>>>>>>>> > Kotresh H R
> >>>>>>>>> >
> >>>>>>>>> > ----- Original Message -----
> >>>>>>>>> > > From: "Soumya Koduri" <skoduri at redhat.com
> >>>>>>>>> <mailto:skoduri at redhat.com>>
> >>>>>>>>> > > To: avishwan at redhat.com <mailto:avishwan at redhat.com>,
> >>>>>>>>> > > "kotresh"
> >>>>>>>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>>
> >>>>>>>>> > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> >>>>>>>>> <mailto:gluster-devel at gluster.org>>
> >>>>>>>>> > > Sent: Monday, February 29, 2016 2:10:51 PM
> >>>>>>>>> > > Subject: Cores generated with
> >>>>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>>>>>>> > >
> >>>>>>>>> > > Hi Aravinda/Kotresh,
> >>>>>>>>> > >
> >>>>>>>>> > > With [1], I consistently see cores generated with the
> >>>>>>>>> > > test
> >>>>>>>>> > > './tests/geo-rep/georep-basic-dr-tarssh.t' in
> >>>>>>>>> > > release-3.7
> >>>>>>>>> branch. From
> >>>>>>>>> > > the cores, looks like we are trying to dereference a
> >>>>>>>>> > > freed
> >>>>>>>>> > > changelog_rpc_clnt_t(crpc) object in
> >>>> changelog_rpc_notify().
> >>>>>>>>> Strangely
> >>>>>>>>> > > this was not reported in master branch.
> >>>>>>>>> > >
> >>>>>>>>> > > I tried debugging but couldn't find any possible
> >>>> suspects. I
> >>>>>>>>> request you
> >>>>>>>>> > > to take a look and let me know if [1] caused any
> >>>> regression.
> >>>>>>>>> > >
> >>>>>>>>> > > Thanks,
> >>>>>>>>> > > Soumya
> >>>>>>>>> > >
> >>>>>>>>> > > [1] http://review.gluster.org/#/c/13507/
> >>>>>>>>> > >
> >>>>>>>>> >
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Gluster-devel mailing list
> >>>>>>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> >>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Raghavendra G
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Gluster-devel mailing list
> >>>>>>> Gluster-devel at gluster.org
> >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> Gluster-devel mailing list
> >>>> Gluster-devel at gluster.org
> >>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Raghavendra G
> >>
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
> >>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list