[Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

Raghavendra Gowdappa rgowdapp at redhat.com
Fri Mar 4 13:52:51 UTC 2016


> 
> Hi Raghavendra,
> 
> >
> The link mentioned seems to be invalid. 

My bad. Sorry about that.

> Just to be clear, we have 3 patches in question here -
> 
> [1] Original patch (merged in master but not in release-3.7)
> http://review.gluster.org/13456
> 
> There are two patches proposed to fix the regression
> [2] http://review.gluster.org/#/c/13587/
> [3] http://review.gluster.org/#/c/13592/
> 
> Since the patch by Kotresh [3] completely fixes the regression, we have
> decided to abandon [2]. In addition, since these fixes look very
> intricate, till we are sure that the code is stable, we thought it may
> be better not to back-port these patches to stable release branch.
> 
> Please confirm if you are implying to revive [2] and  back-port all
> these 3 patches to release-3.7 branch?

[3] is needed for master.
[1] and [3] are needed for release-3.7 branch.

[2] can stay abandoned.

> 
> Thanks,
> Soumya
> 
> >     3. Kotresh will work on new changes to make sure changelog makes
> >     correct use of rpc-clnt.
> >
> >     [1] http://review.gluster.org/#/c/13592
> >     [2] http://review.gluster.org/#/c/1359
> >
> >     regards,
> >     Raghavendra.
> >
> >
> >         Thanks and Regards,
> >         Kotresh H R
> >
> >         ----- Original Message -----
> >         > From: "Soumya Koduri" <skoduri at redhat.com
> >         > <mailto:skoduri at redhat.com>>
> >         > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> >         > <mailto:khiremat at redhat.com>>, "Raghavendra
> >         G" <raghavendra at gluster.com <mailto:raghavendra at gluster.com>>
> >         > Cc: "Gluster Devel" <gluster-devel at gluster.org
> >         > <mailto:gluster-devel at gluster.org>>
> >          > Sent: Thursday, March 3, 2016 5:06:00 PM
> >          > Subject: Re: [Gluster-devel] Cores generated with
> >         ./tests/geo-rep/georep-basic-dr-tarssh.t
> >          >
> >          >
> >          >
> >          > On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
> >          > > [Replying on top of my own reply]
> >          > >
> >          > > Hi,
> >          > >
> >          > > I have submitted the below patch [1] to avoid the issue of
> >          > > 'rpc_clnt_submit'
> >          > > getting reconnected. But it won't take care of memory leak
> >         problem you were
> >          > > trying to fix. That we have to carefully go through all
> >         cases and fix it.
> >          > > Please have a look at it.
> >          > >
> >          > Looks good. IIUC, with this patch, we need not set
> >         conn->trans to NULL
> >          > in 'rpc_clnt_disable()'. Right? If yes, then it takes care of
> >         memleak as
> >          > the transport object shall then get freed as part of
> >          > 'rpc_clnt_trigger_destroy'.
> >          >
> >          >
> >          > > http://review.gluster.org/#/c/13592/
> >          > >
> >          > > Thanks and Regards,
> >          > > Kotresh H R
> >          > >
> >          > > ----- Original Message -----
> >          > >> From: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com
> >         <mailto:khiremat at redhat.com>>
> >          > >> To: "Soumya Koduri" <skoduri at redhat.com
> >         <mailto:skoduri at redhat.com>>
> >          > >> Cc: "Raghavendra G" <raghavendra at gluster.com
> >         <mailto:raghavendra at gluster.com>>, "Gluster Devel"
> >          > >> <gluster-devel at gluster.org
> >          > >> <mailto:gluster-devel at gluster.org>>
> >          > >> Sent: Thursday, March 3, 2016 3:39:11 PM
> >          > >> Subject: Re: [Gluster-devel] Cores generated with
> >          > >> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >          > >>
> >          > >> Hi Soumya,
> >          > >>
> >          > >> I tested the lastes patch [2] on master where your
> >         previous patch [1] in
> >          > >> merged.
> >          > >> I see crashes at different places.
> >          > >>
> >          > >> 1. If there are code paths that are holding rpc object
> >         without taking ref
> >          > >> on
> >          > >> it, all those
> >          > >>     code path will crash on invoking rpc submit on that
> >         object as rpc
> >          > >>     object
> >          > >>     would have freed
> >          > >>     by last unref on DISCONNECT event. I see this kind of
> >         use-case in
> >          > >>     chagnelog rpc code.
> >          > >>     Need to check on other users of rpc.
> >          > Agree. We should fix all such code-paths. Since this seem to
> >         be an
> >          > intricate fix, shall we take these patches only in master
> >         branch and not
> >          > in 3.7 release for now till we fix all such paths as we
> >         encounter?
> >          >
> >          > >>
> >          > >> 2. And also we need to take care of reconnect timers that
> >         are being set
> >          > >> and
> >          > >> are re-tried to
> >          > >>     connect back on expiration. In those cases also, we
> >         might crash as rpc
> >          > >>     object would have freed.
> >          > Your patch addresses this..right?
> >          >
> >          > Thanks,
> >          > Soumya
> >          >
> >          > >>
> >          > >>
> >          > >> [1] http://review.gluster.org/#/c/13507/
> >          > >> [2] http://review.gluster.org/#/c/13587/
> >          > >>
> >          > >> Thanks and Regards,
> >          > >> Kotresh H R
> >          > >>
> >          > >> ----- Original Message -----
> >          > >>> From: "Soumya Koduri" <skoduri at redhat.com
> >         <mailto:skoduri at redhat.com>>
> >          > >>> To: "Raghavendra G" <raghavendra at gluster.com
> >         <mailto:raghavendra at gluster.com>>, "Kotresh Hiremath
> >          > >>> Ravishankar" <khiremat at redhat.com
> >         <mailto:khiremat at redhat.com>>
> >          > >>> Cc: "Gluster Devel" <gluster-devel at gluster.org
> >         <mailto:gluster-devel at gluster.org>>
> >          > >>> Sent: Thursday, March 3, 2016 12:24:00 PM
> >          > >>> Subject: Re: [Gluster-devel] Cores generated with
> >          > >>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >          > >>>
> >          > >>> Thanks a lot Kotresh.
> >          > >>>
> >          > >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> >          > >>>> Hi Soumya,
> >          > >>>>
> >          > >>>> Can you send a fix to this regression on upstream master
> >         too? This patch
> >          > >>>> is merged there.
> >          > >>>>
> >          > >>> I have submitted below patch.
> >          > >>> http://review.gluster.org/#/c/13587/
> >          > >>>
> >          > >>> Kindly review the same.
> >          > >>>
> >          > >>> Thanks,
> >          > >>> Soumya
> >          > >>>
> >          > >>>> regards,
> >          > >>>> Raghavendra
> >          > >>>>
> >          > >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath
> >         Ravishankar
> >          > >>>> <khiremat at redhat.com <mailto:khiremat at redhat.com>
> >         <mailto:khiremat at redhat.com <mailto:khiremat at redhat.com>>> wrote:
> >          > >>>>
> >          > >>>>      Hi Soumya,
> >          > >>>>
> >          > >>>>      I analysed the issue and found out that crash has
> >         happened because
> >          > >>>>      of the patch [1].
> >          > >>>>
> >          > >>>>      The patch doesn't set transport object to NULL in
> >          > >>>>      'rpc_clnt_disable'
> >          > >>>>      but instead does it on
> >          > >>>>      'rpc_clnt_trigger_destroy'. So if there are pending
> >         rpc invocations
> >          > >>>>      on the rpc object that
> >          > >>>>      is disabled (those instances are possible as
> >         happening now in
> >          > >>>>      changelog), it will trigger a
> >          > >>>>      CONNECT notify again with 'mydata' that is freed
> >         causing a crash.
> >          > >>>>      This happens because
> >          > >>>>      'rpc_clnt_submit' reconnects if rpc is not connected.
> >          > >>>>
> >          > >>>>        rpc_clnt_submit (...) {
> >          > >>>>          ...
> >          > >>>>                       if (conn->connected == 0) {
> >          > >>>>                               ret =
> >         rpc_transport_connect (conn->trans,
> >          > >>>>
> >          > >>>>        conn->config.remote_port);
> >          > >>>>                       }
> >          > >>>>          ...
> >          > >>>>        }
> >          > >>>>
> >          > >>>>      Without your patch, conn->trans was set NULL and
> >         hence CONNECT
> >          > >>>>      fails
> >          > >>>>      not resulting with
> >          > >>>>      CONNECT notify call. And also the cleanup happens
> >         in failure path.
> >          > >>>>
> >          > >>>>      So the memory leak can happen, if there is no try
> >         for rpc
> >          > >>>>      invocation
> >          > >>>>      after DISCONNECT.
> >          > >>>>      It will be cleaned up otherwise.
> >          > >>>>
> >          > >>>>
> >          > >>>>      [1] http://review.gluster.org/#/c/13507/
> >          > >>>>
> >          > >>>>      Thanks and Regards,
> >          > >>>>      Kotresh H R
> >          > >>>>
> >          > >>>>      ----- Original Message -----
> >          > >>>>       > From: "Kotresh Hiremath Ravishankar"
> >         <khiremat at redhat.com <mailto:khiremat at redhat.com>
> >          > >>>>      <mailto:khiremat at redhat.com
> >         <mailto:khiremat at redhat.com>>>
> >          > >>>>       > To: "Soumya Koduri" <skoduri at redhat.com
> >         <mailto:skoduri at redhat.com>
> >          > >>>>       > <mailto:skoduri at redhat.com
> >         <mailto:skoduri at redhat.com>>>
> >          > >>>>       > Cc: avishwan at redhat.com
> >         <mailto:avishwan at redhat.com> <mailto:avishwan at redhat.com
> >         <mailto:avishwan at redhat.com>>, "Gluster
> >          > >>>>      Devel" <gluster-devel at gluster.org
> >         <mailto:gluster-devel at gluster.org>
> >          > >>>>      <mailto:gluster-devel at gluster.org
> >         <mailto:gluster-devel at gluster.org>>>
> >          > >>>>       > Sent: Monday, February 29, 2016 4:15:22 PM
> >          > >>>>       > Subject: Re: Cores generated with
> >          > >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
> >          > >>>>       >
> >          > >>>>       > Hi Soumya,
> >          > >>>>       >
> >          > >>>>       > I just tested that it is reproducible only with
> >         your patch both
> >          > >>>>      in master and
> >          > >>>>       > 3.76 branch.
> >          > >>>>       > The geo-rep test cases are marked bad in master.
> >         So it's not hit
> >          > >>>>      in master.
> >          > >>>>       > rpc is introduced
> >          > >>>>       > in changelog xlator to communicate to
> >         applications via
> >          > >>>>      libgfchangelog.
> >          > >>>>       > Venky/Me will check
> >          > >>>>       > why is the crash happening and will update.
> >          > >>>>       >
> >          > >>>>       >
> >          > >>>>       > Thanks and Regards,
> >          > >>>>       > Kotresh H R
> >          > >>>>       >
> >          > >>>>       > ----- Original Message -----
> >          > >>>>       > > From: "Soumya Koduri" <skoduri at redhat.com
> >         <mailto:skoduri at redhat.com>
> >          > >>>>      <mailto:skoduri at redhat.com
> >         <mailto:skoduri at redhat.com>>>
> >          > >>>>       > > To: avishwan at redhat.com
> >         <mailto:avishwan at redhat.com> <mailto:avishwan at redhat.com
> >         <mailto:avishwan at redhat.com>>,
> >          > >>>>       > > "kotresh"
> >          > >>>>      <khiremat at redhat.com <mailto:khiremat at redhat.com>
> >         <mailto:khiremat at redhat.com <mailto:khiremat at redhat.com>>>
> >          > >>>>       > > Cc: "Gluster Devel" <gluster-devel at gluster.org
> >         <mailto:gluster-devel at gluster.org>
> >          > >>>>      <mailto:gluster-devel at gluster.org
> >         <mailto:gluster-devel at gluster.org>>>
> >          > >>>>       > > Sent: Monday, February 29, 2016 2:10:51 PM
> >          > >>>>       > > Subject: Cores generated with
> >          > >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
> >          > >>>>       > >
> >          > >>>>       > > Hi Aravinda/Kotresh,
> >          > >>>>       > >
> >          > >>>>       > > With [1], I consistently see cores generated
> >         with the test
> >          > >>>>       > > './tests/geo-rep/georep-basic-dr-tarssh.t' in
> >         release-3.7
> >          > >>>>      branch. From
> >          > >>>>       > > the cores, looks like we are trying to
> >         dereference a freed
> >          > >>>>       > > changelog_rpc_clnt_t(crpc) object in
> >         changelog_rpc_notify().
> >          > >>>>      Strangely
> >          > >>>>       > > this was not reported in master branch.
> >          > >>>>       > >
> >          > >>>>       > > I tried debugging but couldn't find any
> >         possible suspects. I
> >          > >>>>      request you
> >          > >>>>       > > to take a look and let me know if [1] caused
> >         any regression.
> >          > >>>>       > >
> >          > >>>>       > > Thanks,
> >          > >>>>       > > Soumya
> >          > >>>>       > >
> >          > >>>>       > > [1] http://review.gluster.org/#/c/13507/
> >          > >>>>       > >
> >          > >>>>       >
> >          > >>>>      _______________________________________________
> >          > >>>>      Gluster-devel mailing list
> >          > >>>> Gluster-devel at gluster.org
> >         <mailto:Gluster-devel at gluster.org>
> >         <mailto:Gluster-devel at gluster.org
> >         <mailto:Gluster-devel at gluster.org>>
> >          > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >          > >>>>
> >          > >>>>
> >          > >>>>
> >          > >>>>
> >          > >>>> --
> >          > >>>> Raghavendra G
> >          > >>>
> >          > >> _______________________________________________
> >          > >> Gluster-devel mailing list
> >          > >> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> >          > >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >          > >>
> >          >
> >         _______________________________________________
> >         Gluster-devel mailing list
> >         Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> >         http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> >
> >
> >
> >     --
> >     Raghavendra G
> >
> >
> >
> >
> > --
> > Raghavendra G
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 


More information about the Gluster-devel mailing list