[Gluster-Maintainers] Lock down period merge process

Wed Sep 26 14:44:29 UTC 2018

This was discussed in the maintainers meeting (see notes [1]), and the
conclusion is as follows,

- Merge lock down would be across the code base, and not component
specific. As component level decision goes into more 'good faith'
category and requires more tooling to avoid the same.

- Merge lock down would get closer to when repeated failures are
noticed, than as it stands now (looking for failures across) as we
strengthen the code base

In all testing health maintained at always GREEN is where we want to
reach over time and take a step back to correct any anomalies when we
detect the same to retain the said health.

Shyam

[1] Maintainer meeting notes:
https://lists.gluster.org/pipermail/maintainers/2018-September/005054.html
(see Round table section)
On 09/03/2018 01:47 AM, Pranith Kumar Karampuri wrote:
> 
> 
> On Wed, Aug 22, 2018 at 5:54 PM Shyam Ranganathan <srangana at redhat.com
> <mailto:srangana at redhat.com>> wrote:
> 
>     On 08/18/2018 12:45 AM, Pranith Kumar Karampuri wrote:
>     >
>     >
>     > On Tue, Aug 14, 2018 at 5:29 PM Shyam Ranganathan
>     <srangana at redhat.com <mailto:srangana at redhat.com>
>     > <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>> wrote:
>     >
>     >     On 08/09/2018 01:24 AM, Pranith Kumar Karampuri wrote:
>     >     >
>     >     >
>     >     > On Thu, Aug 9, 2018 at 1:25 AM Shyam Ranganathan
>     >     <srangana at redhat.com <mailto:srangana at redhat.com>
>     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>
>     >     > <mailto:srangana at redhat.com <mailto:srangana at redhat.com>
>     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>>> wrote:
>     >     >
>     >     >     Maintainers,
>     >     >
>     >     >     The following thread talks about a merge during a merge
>     >     lockdown, with
>     >     >     differing view points (this mail is not to discuss the view
>     >     points).
>     >     >
>     >     >     The root of the problem is that we leave the current process
>     >     to good
>     >     >     faith. If we have a simple rule that we will not merge
>     >     anything during a
>     >     >     lock down period, this confusion and any future
>     repetitions of
>     >     the same
>     >     >     would not occur.
>     >     >
>     >     >     I propose that we follow the simpler rule, and would
>     like to hear
>     >     >     thoughts around this.
>     >     >
>     >     >     This also means that in the future, we may not need to
>     remove
>     >     commit
>     >     >     access for other maintainers, as we do *not* follow a good
>     >     faith policy,
>     >     >     and instead depend on being able to revert and announce
>     on the
>     >     threads
>     >     >     why we do so.
>     >     >
>     >     >
>     >     > I think it is a good opportunity to establish guidelines and
>     >     process so
>     >     > that we don't end up in this state in future where one needs
>     to lock
>     >     > down the branch to make it stable. From that p.o.v.
>     discussion on this
>     >     > thread about establishing a process for lock down probably
>     doesn't add
>     >     > much value. My personal opinion for this instance at least
>     is that
>     >     it is
>     >     > good that it was locked down. I tend to forget things and not
>     >     having the
>     >     > access to commit helped follow the process automatically :-).
>     >
>     >     The intention is that master and release branches are always
>     maintained
>     >     in good working order. This involves, tests and related checks
>     passing
>     >     *always*.
>     >
>     >     When this situation is breached, correcting it immediately is
>     better
>     >     than letting it build up, as that would entail longer times
>     and more
>     >     people to fix things up.
>     >
>     >     In an ideal world, if nightly runs fail, the next thing done
>     would be to
>     >     examine patches that were added between the 2 runs, and see if
>     they are
>     >     the cause for failure, and back them out.
>     >
>     >     Hence calling to backout patches is something that would
>     happen more
>     >     regularly in the future if things are breaking.
>     >
>     >
>     > I'm with you till here.
>     >  
>     >
>     >
>     >     Lock down may happen if 2 consecutive nightly builds fail, so
>     as to
>     >     rectify the situation ASAP, and then move onto other work.
>     >
>     >     In short, what I wanted to say is that preventing lock downs
>     in the
>     >     future, is not a state we aspire for.
>     >
>     >
>     > What are the problems you foresee in aspiring for preventing lock
>     downs
>     > for everyone?
> 
>     Any project will have test failures, when things are put together and
>     exercised in different environments.
> 
>     For us, these environments, at present, are nightly regression on
>     centOS7, Brick-mux enabled regression, lcov, clang and in the future we
>     hope to increase this to ASAN, performance baselines, memory leak
>     checks, etc.
> 
>     The whole intent of adding such tests and checks to the test pipeline,
>     is to ensure that regressions to the good state, are caught early and
>     regularly.
> 
>     When these tests and checks actually show up errors/issues, is when we
>     need to pay attention to the same and focus first on getting us back on
>     track.
> 
>     At the above juncture, is when we need the lockdown or commit blackout
>     (or whatever we want to call it). The intent is to ensure that we do not
>     add more patches and further destabilize the branch, but stabilize it
>     first and then get other changes in later.
> 
>     There is a certain probability with which the above event will happen,
>     and that can be reduced, if we are more stringent in our development
>     practices, and ensuring code health is maintained (both by more checks
>     in smoke and more tests per patch or even otherwise, and also by keener
>     reviews and other such means).
> 
>     We also need to be proactive to, monitoring and fixing, failures in the
>     tests, so that we can address them quickly, rather than someone calling
>     out a lockdown.
> 
>     Now, is your question that we should avoid the above state altogether?
>     Or, how to retain commit access for all, but still have such states as
>     above, where only patches that help stabilization are merged?
> 
>     For the former, I do not have an answer, we can reduce the event
>     probability as stated above, but it will never disappear (which means
>     all those tests are pretty much of no value and needs to improve).
> 
> 
> Depends on how frequently it will happen. Let us agree on some tolerance
> level and get the maintainer/developer to address that particular test
> before any other patches in that component can be allowed to be merged.
> So per component lock-down instead of whole project lock-down.
>  
> 
> 
>     For the latter, it circles back to what I stated in earlier responses,
>     on good faith versus process, and we would like to stick to a process
>     and keep the commit list short, rather than use good faith to avoid any
>     inadvertent "I missed the mail" cases.
> 
>     >  
>     >
>     >     Lock downs may/will happen, it is
>     >     done to get the branches stable quicker, than spend long times
>     trying to
>     >     find what caused the instability in the first place.
>     >
>     >
>     >
>     >  
>     >
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     Please note, if there are extraneous situations (say
>     reported
>     >     security
>     >     >     vulnerabilities that need fixes ASAP) we may need to
>     loosen up the
>     >     >     stringency, as that would take precedence over the lock
>     down.
>     >     These
>     >     >     exceptions though, can be called out and hence treated
>     as such.
>     >     >
>     >     >     Thoughts?
>     >     >
>     >     >
>     >     > This is again my personal opinion. We don't need to merge
>     patches
>     >     in any
>     >     > branch unless we need to make an immediate release with that
>     >     patch. For
>     >     > example if there is a security issue reported we *must* make
>     a release
>     >     > with the fix immediately so it makes sense to merge it and
>     do the
>     >     release.
>     >
>     >     Agree, keeps the rule simple during lock down and not open to
>     >     interpretations.
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     Shyam
>     >     >
>     >     >     PS: Added Yaniv to the CC as he reported the deviance
>     >     >
>     >     >     -------- Forwarded Message --------
>     >     >     Subject:        Re: [Gluster-devel] Release 5: Master branch
>     >     health
>     >     >     report
>     >     >     (Week of 30th July)
>     >     >     Date:   Tue, 7 Aug 2018 23:22:09 +0300
>     >     >     From:   Yaniv Kaul <ykaul at redhat.com
>     <mailto:ykaul at redhat.com> <mailto:ykaul at redhat.com
>     <mailto:ykaul at redhat.com>>
>     >     <mailto:ykaul at redhat.com <mailto:ykaul at redhat.com>
>     <mailto:ykaul at redhat.com <mailto:ykaul at redhat.com>>>>
>     >     >     To:     Shyam Ranganathan <srangana at redhat.com
>     <mailto:srangana at redhat.com>
>     >     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>
>     >     >     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>
>     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>>>
>     >     >     CC:     GlusterFS Maintainers <maintainers at gluster.org
>     <mailto:maintainers at gluster.org>
>     >     <mailto:maintainers at gluster.org <mailto:maintainers at gluster.org>>
>     >     >     <mailto:maintainers at gluster.org
>     <mailto:maintainers at gluster.org>
>     >     <mailto:maintainers at gluster.org
>     <mailto:maintainers at gluster.org>>>>, Gluster Devel
>     >     >     <gluster-devel at gluster.org
>     <mailto:gluster-devel at gluster.org> <mailto:gluster-devel at gluster.org
>     <mailto:gluster-devel at gluster.org>>
>     >     <mailto:gluster-devel at gluster.org
>     <mailto:gluster-devel at gluster.org> <mailto:gluster-devel at gluster.org
>     <mailto:gluster-devel at gluster.org>>>>,
>     >     >     Nigel Babu <nigelb at redhat.com <mailto:nigelb at redhat.com>
>     <mailto:nigelb at redhat.com <mailto:nigelb at redhat.com>>
>     >     <mailto:nigelb at redhat.com <mailto:nigelb at redhat.com>
>     <mailto:nigelb at redhat.com <mailto:nigelb at redhat.com>>>>
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >     On Tue, Aug 7, 2018, 10:46 PM Shyam Ranganathan
>     >     <srangana at redhat.com <mailto:srangana at redhat.com>
>     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>
>     >     >     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>
>     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>>
>     >     >     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>
>     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>
>     >     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>
>     <mailto:srangana at redhat.com <mailto:srangana at redhat.com>>>>> wrote:
>     >     >
>     >     >         On 08/07/2018 02:58 PM, Yaniv Kaul wrote:
>     >     >         >     The intention is to stabilize master and not add
>     >     more patches
>     >     >         that my
>     >     >         >     destabilize it.
>     >     >         >
>     >     >         >
>     >     >         > https://review.gluster.org/#/c/20603/ has been merged.
>     >     >         > As far as I can see, it has nothing to do with
>     >     stabilization and
>     >     >         should
>     >     >         > be reverted.
>     >     >
>     >     >         Posted this on the gerrit review as well:
>     >     >
>     >     >         <snip>
>     >     >         4.1 does not have nightly tests, those run on master
>     only.
>     >     >
>     >     >
>     >     >     That should change of course. We cannot strive for stability
>     >     otherwise,
>     >     >     AFAIK. 
>     >     >
>     >     >
>     >     >         Stability of master does not (will not), in the near
>     term
>     >     guarantee
>     >     >         stability of release branches, unless patches that
>     impact code
>     >     >     already
>     >     >         on release branches, get fixes on master and are
>     back ported.
>     >     >
>     >     >         Release branches get fixes back ported (as is normal),
>     >     this fix
>     >     >     and its
>     >     >         merge should not impact current master stability in any
>     >     way, and
>     >     >     neither
>     >     >         stability of 4.1 branch.
>     >     >         </snip>
>     >     >
>     >     >         The current hold is on master, not on release
>     branches. I
>     >     agree that
>     >     >         merging further code changes on release branches
>     (for example
>     >     >     geo-rep
>     >     >         issues that are backported (see [1]), as there are tests
>     >     that fail
>     >     >         regularly on master), may further destabilize the
>     release
>     >     >     branch. This
>     >     >         patch is not one of those.
>     >     >
>     >     >
>     >     >     Two issues I have with the merge:
>     >     >     1. It just makes comparing master branch to release branch
>     >     harder. For
>     >     >     example, to understand if there's a test that fails on
>     master but
>     >     >     succeeds on release branch, or vice versa. 
>     >     >     2. It means we are not focused on stabilizing master branch.
>     >     >     Y.
>     >     >
>     >     >
>     >     >         Merging patches on release branches are allowed by
>     release
>     >     >     owners only,
>     >     >         and usual practice is keeping the backlog low
>     (merging weekly)
>     >     >     in these
>     >     >         cases as per the dashboard [1].
>     >     >
>     >     >         Allowing for the above 2 reasons this patch was found,
>     >     >         - Not on master
>     >     >         - Not stabilizing or destabilizing the release branch
>     >     >         and hence was merged.
>     >     >
>     >     >         If maintainers disagree I can revert the same.
>     >     >
>     >     >         Shyam
>     >     >
>     >     >         [1] Release 4.1 dashboard:
>     >     >
>     >     >   
>     >   
>       https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:4-1-dashboard
>     >     >
>     >     >     _______________________________________________
>     >     >     maintainers mailing list
>     >     >     maintainers at gluster.org <mailto:maintainers at gluster.org>
>     <mailto:maintainers at gluster.org <mailto:maintainers at gluster.org>>
>     >     <mailto:maintainers at gluster.org
>     <mailto:maintainers at gluster.org> <mailto:maintainers at gluster.org
>     <mailto:maintainers at gluster.org>>>
>     >     >     https://lists.gluster.org/mailman/listinfo/maintainers
>     >     >
>     >     >
>     >     >
>     >     > --
>     >     > Pranith
>     >
>     >
>     >
>     > --
>     > Pranith
> 
> 
> 
> -- 
> Pranith