[Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
Kotresh Hiremath Ravishankar
khiremat at redhat.com
Wed Mar 9 15:46:30 UTC 2016
Hi All,
The following patch is sent to address changelog rpc mem-leak issues.
The fix is intricate and needs lot of testing before taking in. Please
review the same.
http://review.gluster.org/#/c/13658/1
Thanks and Regards,
Kotresh H R
----- Original Message -----
> From: "Venky Shankar" <vshankar at redhat.com>
> To: "Raghavendra G" <raghavendra at gluster.com>
> Cc: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Monday, March 7, 2016 3:52:13 PM
> Subject: Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
>
> On Fri, Mar 04, 2016 at 02:02:32PM +0530, Raghavendra G wrote:
> > On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar <
> > khiremat at redhat.com> wrote:
> >
> > > Hi,
> > >
> > > Yes, with this patch we need not set conn->trans to NULL in
> > > rpc_clnt_disable
> > >
> >
> > While [1] fixes the crash, things can be improved in the way how changelog
> > is using rpc.
> >
> > 1. In the current code, there is an rpc_clnt object leak during disconnect
> > event.
>
> Just for the record (as I couldn't find this information while building up
> rpc infrastructure in changelog):
>
> Unref rpc-clnt object after calling rpc_clnt_disable() in notify() upon
> RPC_CLNT_DISCONNECT. Free up 'mydata' in notify() upon RPC_CLNT_DESTROY.
>
> > 2. Also, freed "mydata" of changelog is still associated with rpc_clnt
> > object (corollary of 1), though change log might not get any events with
> > "mydata" (as connection is dead).
> >
> > I've discussed with Kotresh about changes needed, offline. So, following
> > are the action items.
> > 1. Soumya's patch [2] is valid and is needed for 3.7 branch too.
> > 2. [2] can be accepted. However, someone might want to re-use an rpc object
> > after disabling it, like introducing a new api rpc_clnt_enable_again
> > (though no of such use-cases is very less). But [2] doesn't allow it. The
> > point is as long as rpc-clnt object is alive, transport object is alive
> > (though disconnected) and we can re-use it. So, I would prefer not to
> > accept it.
> > 3. Kotresh will work on new changes to make sure changelog makes correct
> > use of rpc-clnt.
> >
> > [1] http://review.gluster.org/#/c/13592
> > [2] http://review.gluster.org/#/c/1359
> >
> > regards,
> > Raghavendra.
> >
> >
> > > Thanks and Regards,
> > > Kotresh H R
> > >
> > > ----- Original Message -----
> > > > From: "Soumya Koduri" <skoduri at redhat.com>
> > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>, "Raghavendra
> > > G" <raghavendra at gluster.com>
> > > > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > > > Sent: Thursday, March 3, 2016 5:06:00 PM
> > > > Subject: Re: [Gluster-devel] Cores generated with
> > > ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >
> > > >
> > > >
> > > > On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
> > > > > [Replying on top of my own reply]
> > > > >
> > > > > Hi,
> > > > >
> > > > > I have submitted the below patch [1] to avoid the issue of
> > > > > 'rpc_clnt_submit'
> > > > > getting reconnected. But it won't take care of memory leak problem
> > > > > you
> > > were
> > > > > trying to fix. That we have to carefully go through all cases and fix
> > > it.
> > > > > Please have a look at it.
> > > > >
> > > > Looks good. IIUC, with this patch, we need not set conn->trans to NULL
> > > > in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak
> > > > as
> > > > the transport object shall then get freed as part of
> > > > 'rpc_clnt_trigger_destroy'.
> > > >
> > > >
> > > > > http://review.gluster.org/#/c/13592/
> > > > >
> > > > > Thanks and Regards,
> > > > > Kotresh H R
> > > > >
> > > > > ----- Original Message -----
> > > > >> From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> > > > >> To: "Soumya Koduri" <skoduri at redhat.com>
> > > > >> Cc: "Raghavendra G" <raghavendra at gluster.com>, "Gluster Devel"
> > > > >> <gluster-devel at gluster.org>
> > > > >> Sent: Thursday, March 3, 2016 3:39:11 PM
> > > > >> Subject: Re: [Gluster-devel] Cores generated with
> > > > >> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > > >>
> > > > >> Hi Soumya,
> > > > >>
> > > > >> I tested the lastes patch [2] on master where your previous patch
> > > > >> [1]
> > > in
> > > > >> merged.
> > > > >> I see crashes at different places.
> > > > >>
> > > > >> 1. If there are code paths that are holding rpc object without
> > > > >> taking
> > > ref
> > > > >> on
> > > > >> it, all those
> > > > >> code path will crash on invoking rpc submit on that object as
> > > > >> rpc
> > > > >> object
> > > > >> would have freed
> > > > >> by last unref on DISCONNECT event. I see this kind of use-case
> > > > >> in
> > > > >> chagnelog rpc code.
> > > > >> Need to check on other users of rpc.
> > > > Agree. We should fix all such code-paths. Since this seem to be an
> > > > intricate fix, shall we take these patches only in master branch and
> > > > not
> > > > in 3.7 release for now till we fix all such paths as we encounter?
> > > >
> > > > >>
> > > > >> 2. And also we need to take care of reconnect timers that are being
> > > set
> > > > >> and
> > > > >> are re-tried to
> > > > >> connect back on expiration. In those cases also, we might crash
> > > as rpc
> > > > >> object would have freed.
> > > > Your patch addresses this..right?
> > > >
> > > > Thanks,
> > > > Soumya
> > > >
> > > > >>
> > > > >>
> > > > >> [1] http://review.gluster.org/#/c/13507/
> > > > >> [2] http://review.gluster.org/#/c/13587/
> > > > >>
> > > > >> Thanks and Regards,
> > > > >> Kotresh H R
> > > > >>
> > > > >> ----- Original Message -----
> > > > >>> From: "Soumya Koduri" <skoduri at redhat.com>
> > > > >>> To: "Raghavendra G" <raghavendra at gluster.com>, "Kotresh Hiremath
> > > > >>> Ravishankar" <khiremat at redhat.com>
> > > > >>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > > > >>> Sent: Thursday, March 3, 2016 12:24:00 PM
> > > > >>> Subject: Re: [Gluster-devel] Cores generated with
> > > > >>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > > >>>
> > > > >>> Thanks a lot Kotresh.
> > > > >>>
> > > > >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> > > > >>>> Hi Soumya,
> > > > >>>>
> > > > >>>> Can you send a fix to this regression on upstream master too? This
> > > patch
> > > > >>>> is merged there.
> > > > >>>>
> > > > >>> I have submitted below patch.
> > > > >>> http://review.gluster.org/#/c/13587/
> > > > >>>
> > > > >>> Kindly review the same.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Soumya
> > > > >>>
> > > > >>>> regards,
> > > > >>>> Raghavendra
> > > > >>>>
> > > > >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> > > > >>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>> wrote:
> > > > >>>>
> > > > >>>> Hi Soumya,
> > > > >>>>
> > > > >>>> I analysed the issue and found out that crash has happened
> > > because
> > > > >>>> of the patch [1].
> > > > >>>>
> > > > >>>> The patch doesn't set transport object to NULL in
> > > > >>>> 'rpc_clnt_disable'
> > > > >>>> but instead does it on
> > > > >>>> 'rpc_clnt_trigger_destroy'. So if there are pending rpc
> > > invocations
> > > > >>>> on the rpc object that
> > > > >>>> is disabled (those instances are possible as happening now in
> > > > >>>> changelog), it will trigger a
> > > > >>>> CONNECT notify again with 'mydata' that is freed causing a
> > > crash.
> > > > >>>> This happens because
> > > > >>>> 'rpc_clnt_submit' reconnects if rpc is not connected.
> > > > >>>>
> > > > >>>> rpc_clnt_submit (...) {
> > > > >>>> ...
> > > > >>>> if (conn->connected == 0) {
> > > > >>>> ret = rpc_transport_connect
> > > (conn->trans,
> > > > >>>>
> > > > >>>> conn->config.remote_port);
> > > > >>>> }
> > > > >>>> ...
> > > > >>>> }
> > > > >>>>
> > > > >>>> Without your patch, conn->trans was set NULL and hence
> > > > >>>> CONNECT
> > > > >>>> fails
> > > > >>>> not resulting with
> > > > >>>> CONNECT notify call. And also the cleanup happens in failure
> > > path.
> > > > >>>>
> > > > >>>> So the memory leak can happen, if there is no try for rpc
> > > > >>>> invocation
> > > > >>>> after DISCONNECT.
> > > > >>>> It will be cleaned up otherwise.
> > > > >>>>
> > > > >>>>
> > > > >>>> [1] http://review.gluster.org/#/c/13507/
> > > > >>>>
> > > > >>>> Thanks and Regards,
> > > > >>>> Kotresh H R
> > > > >>>>
> > > > >>>> ----- Original Message -----
> > > > >>>> > From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> > > > >>>> <mailto:khiremat at redhat.com>>
> > > > >>>> > To: "Soumya Koduri" <skoduri at redhat.com
> > > > >>>> > <mailto:skoduri at redhat.com>>
> > > > >>>> > Cc: avishwan at redhat.com <mailto:avishwan at redhat.com>,
> > > "Gluster
> > > > >>>> Devel" <gluster-devel at gluster.org
> > > > >>>> <mailto:gluster-devel at gluster.org>>
> > > > >>>> > Sent: Monday, February 29, 2016 4:15:22 PM
> > > > >>>> > Subject: Re: Cores generated with
> > > > >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > > >>>> >
> > > > >>>> > Hi Soumya,
> > > > >>>> >
> > > > >>>> > I just tested that it is reproducible only with your patch
> > > both
> > > > >>>> in master and
> > > > >>>> > 3.76 branch.
> > > > >>>> > The geo-rep test cases are marked bad in master. So it's
> > > not hit
> > > > >>>> in master.
> > > > >>>> > rpc is introduced
> > > > >>>> > in changelog xlator to communicate to applications via
> > > > >>>> libgfchangelog.
> > > > >>>> > Venky/Me will check
> > > > >>>> > why is the crash happening and will update.
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > Thanks and Regards,
> > > > >>>> > Kotresh H R
> > > > >>>> >
> > > > >>>> > ----- Original Message -----
> > > > >>>> > > From: "Soumya Koduri" <skoduri at redhat.com
> > > > >>>> <mailto:skoduri at redhat.com>>
> > > > >>>> > > To: avishwan at redhat.com <mailto:avishwan at redhat.com>,
> > > > >>>> > > "kotresh"
> > > > >>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>>
> > > > >>>> > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> > > > >>>> <mailto:gluster-devel at gluster.org>>
> > > > >>>> > > Sent: Monday, February 29, 2016 2:10:51 PM
> > > > >>>> > > Subject: Cores generated with
> > > > >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > > >>>> > >
> > > > >>>> > > Hi Aravinda/Kotresh,
> > > > >>>> > >
> > > > >>>> > > With [1], I consistently see cores generated with the
> > > > >>>> > > test
> > > > >>>> > > './tests/geo-rep/georep-basic-dr-tarssh.t' in
> > > > >>>> > > release-3.7
> > > > >>>> branch. From
> > > > >>>> > > the cores, looks like we are trying to dereference a
> > > > >>>> > > freed
> > > > >>>> > > changelog_rpc_clnt_t(crpc) object in
> > > changelog_rpc_notify().
> > > > >>>> Strangely
> > > > >>>> > > this was not reported in master branch.
> > > > >>>> > >
> > > > >>>> > > I tried debugging but couldn't find any possible
> > > suspects. I
> > > > >>>> request you
> > > > >>>> > > to take a look and let me know if [1] caused any
> > > regression.
> > > > >>>> > >
> > > > >>>> > > Thanks,
> > > > >>>> > > Soumya
> > > > >>>> > >
> > > > >>>> > > [1] http://review.gluster.org/#/c/13507/
> > > > >>>> > >
> > > > >>>> >
> > > > >>>> _______________________________________________
> > > > >>>> Gluster-devel mailing list
> > > > >>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> > > > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> --
> > > > >>>> Raghavendra G
> > > > >>>
> > > > >> _______________________________________________
> > > > >> Gluster-devel mailing list
> > > > >> Gluster-devel at gluster.org
> > > > >> http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > >>
> > > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> >
> >
> >
> > --
> > Raghavendra G
>
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
More information about the Gluster-devel
mailing list