[Gluster-Maintainers] Lock down period merge process

Wed Oct 3 13:32:34 UTC 2018

On 10/03/2018 05:36 AM, Pranith Kumar Karampuri wrote:
> 
> 
> On Thu, Sep 27, 2018 at 8:18 PM Shyam Ranganathan <srangana at redhat.com
> <mailto:srangana at redhat.com>> wrote:
> 
>     On 09/27/2018 10:05 AM, Atin Mukherjee wrote:
>     >         Now does this mean we block commit rights for component Y till
>     >         we have the root cause?
>     >
>     >
>     >     It was a way of making it someone's priority. If you have another
>     >     way to make it someone's priority that is better than this, please
>     >     suggest and we can have a discussion around it and agree on it
>     :-).
>     >
>     >
>     > This is what I can think of:
>     >
>     > 1. Component peers/maintainers take a first triage of the test
>     failure.
>     > Do the initial debugging and (a) point to the component which needs
>     > further debugging or (b) seek for help at gluster-devel ML for
>     > additional insight for identifying the problem and narrowing down to a
>     > component. 
>     > 2. If it’s (1 a) then we already know the component and the owner. If
>     > it’s (2 b) at this juncture, it’s all maintainers responsibility to
>     > ensure the email is well understood and based on the available details
>     > the ownership is picked up by respective maintainers. It might be also
>     > needed that multiple maintainers might have to be involved and this is
>     > why I focus on this as a group effort than individual one.
> 
>     In my thinking, acting as a group here is better than making it a
>     sub-groups/individuals responsibility. Which has been put forth by Atin
>     (IMO) well. Thus, keep the merge rights out for all (of course some
>     still need to have it), and get the situation addressed is better.
> 
> 
> In my experience, it has been rather difficult for developers without
> domain expertise to solve the problem (at least on the components I am
> maintaining), so the reality is that not everyone may be able to solve
> the issues on all the components where the problem is observed. May be
> you mean we need more participation  when you say we need to act as a
> group, so with that assumption one way to make that happen is to change
> the workflow around 'recheck centos'. In my thinking following the tools
> shouldn't lead to less participation on gluster-devel where developers
> can just do recheck-centos until the test passes and be done. So maybe
> tooling should encourage participation. Maybe something like 'recheck
> centos <link-to-mail-where-they-reported-it-on-gluster-devel>' This is
> just an idea, thoughts are welcome.

I agree, any recheck should have enough reason behind it to state why
the recheck is being attempted, and what the failures were, which are
deemed spurious or otherwise to require a recheck.

The manner of enforcing the same is not present yet, and is possibly an
orthogonal discussion to the one here.

The recheck stringency (and I would add even the retry a test if it
fails once should be removed), will aid in getting to less frequent
breakage in nightly, as more effort is put into correcting the tests or
fixing the code around the same.

Once we have distributed tests running, such that overall regression
time is reduced, we can possibly tackle removing retries for tests, and
then getting to a more stringent recheck process/tooling. The reason
being, we now run to completion and that takes quite a bit of time, so
at this juncture removing retry is not practical, but we should get
there (soon?).

>  
> 
>     _______________________________________________
>     maintainers mailing list
>     maintainers at gluster.org <mailto:maintainers at gluster.org>
>     https://lists.gluster.org/mailman/listinfo/maintainers
> 
> 
> 
> -- 
> Pranith