[Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
Raghavendra Gowdappa
rgowdapp at redhat.com
Fri Mar 4 13:52:51 UTC 2016
>
> Hi Raghavendra,
>
> >
> The link mentioned seems to be invalid.
My bad. Sorry about that.
> Just to be clear, we have 3 patches in question here -
>
> [1] Original patch (merged in master but not in release-3.7)
> http://review.gluster.org/13456
>
> There are two patches proposed to fix the regression
> [2] http://review.gluster.org/#/c/13587/
> [3] http://review.gluster.org/#/c/13592/
>
> Since the patch by Kotresh [3] completely fixes the regression, we have
> decided to abandon [2]. In addition, since these fixes look very
> intricate, till we are sure that the code is stable, we thought it may
> be better not to back-port these patches to stable release branch.
>
> Please confirm if you are implying to revive [2] and back-port all
> these 3 patches to release-3.7 branch?
[3] is needed for master.
[1] and [3] are needed for release-3.7 branch.
[2] can stay abandoned.
>
> Thanks,
> Soumya
>
> > 3. Kotresh will work on new changes to make sure changelog makes
> > correct use of rpc-clnt.
> >
> > [1] http://review.gluster.org/#/c/13592
> > [2] http://review.gluster.org/#/c/1359
> >
> > regards,
> > Raghavendra.
> >
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > ----- Original Message -----
> > > From: "Soumya Koduri" <skoduri at redhat.com
> > > <mailto:skoduri at redhat.com>>
> > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> > > <mailto:khiremat at redhat.com>>, "Raghavendra
> > G" <raghavendra at gluster.com <mailto:raghavendra at gluster.com>>
> > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> > > <mailto:gluster-devel at gluster.org>>
> > > Sent: Thursday, March 3, 2016 5:06:00 PM
> > > Subject: Re: [Gluster-devel] Cores generated with
> > ./tests/geo-rep/georep-basic-dr-tarssh.t
> > >
> > >
> > >
> > > On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
> > > > [Replying on top of my own reply]
> > > >
> > > > Hi,
> > > >
> > > > I have submitted the below patch [1] to avoid the issue of
> > > > 'rpc_clnt_submit'
> > > > getting reconnected. But it won't take care of memory leak
> > problem you were
> > > > trying to fix. That we have to carefully go through all
> > cases and fix it.
> > > > Please have a look at it.
> > > >
> > > Looks good. IIUC, with this patch, we need not set
> > conn->trans to NULL
> > > in 'rpc_clnt_disable()'. Right? If yes, then it takes care of
> > memleak as
> > > the transport object shall then get freed as part of
> > > 'rpc_clnt_trigger_destroy'.
> > >
> > >
> > > > http://review.gluster.org/#/c/13592/
> > > >
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ----- Original Message -----
> > > >> From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> > <mailto:khiremat at redhat.com>>
> > > >> To: "Soumya Koduri" <skoduri at redhat.com
> > <mailto:skoduri at redhat.com>>
> > > >> Cc: "Raghavendra G" <raghavendra at gluster.com
> > <mailto:raghavendra at gluster.com>>, "Gluster Devel"
> > > >> <gluster-devel at gluster.org
> > > >> <mailto:gluster-devel at gluster.org>>
> > > >> Sent: Thursday, March 3, 2016 3:39:11 PM
> > > >> Subject: Re: [Gluster-devel] Cores generated with
> > > >> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>
> > > >> Hi Soumya,
> > > >>
> > > >> I tested the lastes patch [2] on master where your
> > previous patch [1] in
> > > >> merged.
> > > >> I see crashes at different places.
> > > >>
> > > >> 1. If there are code paths that are holding rpc object
> > without taking ref
> > > >> on
> > > >> it, all those
> > > >> code path will crash on invoking rpc submit on that
> > object as rpc
> > > >> object
> > > >> would have freed
> > > >> by last unref on DISCONNECT event. I see this kind of
> > use-case in
> > > >> chagnelog rpc code.
> > > >> Need to check on other users of rpc.
> > > Agree. We should fix all such code-paths. Since this seem to
> > be an
> > > intricate fix, shall we take these patches only in master
> > branch and not
> > > in 3.7 release for now till we fix all such paths as we
> > encounter?
> > >
> > > >>
> > > >> 2. And also we need to take care of reconnect timers that
> > are being set
> > > >> and
> > > >> are re-tried to
> > > >> connect back on expiration. In those cases also, we
> > might crash as rpc
> > > >> object would have freed.
> > > Your patch addresses this..right?
> > >
> > > Thanks,
> > > Soumya
> > >
> > > >>
> > > >>
> > > >> [1] http://review.gluster.org/#/c/13507/
> > > >> [2] http://review.gluster.org/#/c/13587/
> > > >>
> > > >> Thanks and Regards,
> > > >> Kotresh H R
> > > >>
> > > >> ----- Original Message -----
> > > >>> From: "Soumya Koduri" <skoduri at redhat.com
> > <mailto:skoduri at redhat.com>>
> > > >>> To: "Raghavendra G" <raghavendra at gluster.com
> > <mailto:raghavendra at gluster.com>>, "Kotresh Hiremath
> > > >>> Ravishankar" <khiremat at redhat.com
> > <mailto:khiremat at redhat.com>>
> > > >>> Cc: "Gluster Devel" <gluster-devel at gluster.org
> > <mailto:gluster-devel at gluster.org>>
> > > >>> Sent: Thursday, March 3, 2016 12:24:00 PM
> > > >>> Subject: Re: [Gluster-devel] Cores generated with
> > > >>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>>
> > > >>> Thanks a lot Kotresh.
> > > >>>
> > > >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> > > >>>> Hi Soumya,
> > > >>>>
> > > >>>> Can you send a fix to this regression on upstream master
> > too? This patch
> > > >>>> is merged there.
> > > >>>>
> > > >>> I have submitted below patch.
> > > >>> http://review.gluster.org/#/c/13587/
> > > >>>
> > > >>> Kindly review the same.
> > > >>>
> > > >>> Thanks,
> > > >>> Soumya
> > > >>>
> > > >>>> regards,
> > > >>>> Raghavendra
> > > >>>>
> > > >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath
> > Ravishankar
> > > >>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>
> > <mailto:khiremat at redhat.com <mailto:khiremat at redhat.com>>> wrote:
> > > >>>>
> > > >>>> Hi Soumya,
> > > >>>>
> > > >>>> I analysed the issue and found out that crash has
> > happened because
> > > >>>> of the patch [1].
> > > >>>>
> > > >>>> The patch doesn't set transport object to NULL in
> > > >>>> 'rpc_clnt_disable'
> > > >>>> but instead does it on
> > > >>>> 'rpc_clnt_trigger_destroy'. So if there are pending
> > rpc invocations
> > > >>>> on the rpc object that
> > > >>>> is disabled (those instances are possible as
> > happening now in
> > > >>>> changelog), it will trigger a
> > > >>>> CONNECT notify again with 'mydata' that is freed
> > causing a crash.
> > > >>>> This happens because
> > > >>>> 'rpc_clnt_submit' reconnects if rpc is not connected.
> > > >>>>
> > > >>>> rpc_clnt_submit (...) {
> > > >>>> ...
> > > >>>> if (conn->connected == 0) {
> > > >>>> ret =
> > rpc_transport_connect (conn->trans,
> > > >>>>
> > > >>>> conn->config.remote_port);
> > > >>>> }
> > > >>>> ...
> > > >>>> }
> > > >>>>
> > > >>>> Without your patch, conn->trans was set NULL and
> > hence CONNECT
> > > >>>> fails
> > > >>>> not resulting with
> > > >>>> CONNECT notify call. And also the cleanup happens
> > in failure path.
> > > >>>>
> > > >>>> So the memory leak can happen, if there is no try
> > for rpc
> > > >>>> invocation
> > > >>>> after DISCONNECT.
> > > >>>> It will be cleaned up otherwise.
> > > >>>>
> > > >>>>
> > > >>>> [1] http://review.gluster.org/#/c/13507/
> > > >>>>
> > > >>>> Thanks and Regards,
> > > >>>> Kotresh H R
> > > >>>>
> > > >>>> ----- Original Message -----
> > > >>>> > From: "Kotresh Hiremath Ravishankar"
> > <khiremat at redhat.com <mailto:khiremat at redhat.com>
> > > >>>> <mailto:khiremat at redhat.com
> > <mailto:khiremat at redhat.com>>>
> > > >>>> > To: "Soumya Koduri" <skoduri at redhat.com
> > <mailto:skoduri at redhat.com>
> > > >>>> > <mailto:skoduri at redhat.com
> > <mailto:skoduri at redhat.com>>>
> > > >>>> > Cc: avishwan at redhat.com
> > <mailto:avishwan at redhat.com> <mailto:avishwan at redhat.com
> > <mailto:avishwan at redhat.com>>, "Gluster
> > > >>>> Devel" <gluster-devel at gluster.org
> > <mailto:gluster-devel at gluster.org>
> > > >>>> <mailto:gluster-devel at gluster.org
> > <mailto:gluster-devel at gluster.org>>>
> > > >>>> > Sent: Monday, February 29, 2016 4:15:22 PM
> > > >>>> > Subject: Re: Cores generated with
> > > >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>>> >
> > > >>>> > Hi Soumya,
> > > >>>> >
> > > >>>> > I just tested that it is reproducible only with
> > your patch both
> > > >>>> in master and
> > > >>>> > 3.76 branch.
> > > >>>> > The geo-rep test cases are marked bad in master.
> > So it's not hit
> > > >>>> in master.
> > > >>>> > rpc is introduced
> > > >>>> > in changelog xlator to communicate to
> > applications via
> > > >>>> libgfchangelog.
> > > >>>> > Venky/Me will check
> > > >>>> > why is the crash happening and will update.
> > > >>>> >
> > > >>>> >
> > > >>>> > Thanks and Regards,
> > > >>>> > Kotresh H R
> > > >>>> >
> > > >>>> > ----- Original Message -----
> > > >>>> > > From: "Soumya Koduri" <skoduri at redhat.com
> > <mailto:skoduri at redhat.com>
> > > >>>> <mailto:skoduri at redhat.com
> > <mailto:skoduri at redhat.com>>>
> > > >>>> > > To: avishwan at redhat.com
> > <mailto:avishwan at redhat.com> <mailto:avishwan at redhat.com
> > <mailto:avishwan at redhat.com>>,
> > > >>>> > > "kotresh"
> > > >>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>
> > <mailto:khiremat at redhat.com <mailto:khiremat at redhat.com>>>
> > > >>>> > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> > <mailto:gluster-devel at gluster.org>
> > > >>>> <mailto:gluster-devel at gluster.org
> > <mailto:gluster-devel at gluster.org>>>
> > > >>>> > > Sent: Monday, February 29, 2016 2:10:51 PM
> > > >>>> > > Subject: Cores generated with
> > > >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>>> > >
> > > >>>> > > Hi Aravinda/Kotresh,
> > > >>>> > >
> > > >>>> > > With [1], I consistently see cores generated
> > with the test
> > > >>>> > > './tests/geo-rep/georep-basic-dr-tarssh.t' in
> > release-3.7
> > > >>>> branch. From
> > > >>>> > > the cores, looks like we are trying to
> > dereference a freed
> > > >>>> > > changelog_rpc_clnt_t(crpc) object in
> > changelog_rpc_notify().
> > > >>>> Strangely
> > > >>>> > > this was not reported in master branch.
> > > >>>> > >
> > > >>>> > > I tried debugging but couldn't find any
> > possible suspects. I
> > > >>>> request you
> > > >>>> > > to take a look and let me know if [1] caused
> > any regression.
> > > >>>> > >
> > > >>>> > > Thanks,
> > > >>>> > > Soumya
> > > >>>> > >
> > > >>>> > > [1] http://review.gluster.org/#/c/13507/
> > > >>>> > >
> > > >>>> >
> > > >>>> _______________________________________________
> > > >>>> Gluster-devel mailing list
> > > >>>> Gluster-devel at gluster.org
> > <mailto:Gluster-devel at gluster.org>
> > <mailto:Gluster-devel at gluster.org
> > <mailto:Gluster-devel at gluster.org>>
> > > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Raghavendra G
> > > >>>
> > > >> _______________________________________________
> > > >> Gluster-devel mailing list
> > > >> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> > > >> http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >>
> > >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> >
> >
> >
> > --
> > Raghavendra G
> >
> >
> >
> >
> > --
> > Raghavendra G
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list