[Gluster-devel] [Gluster-Maintainers] Proposal: move glusterfs development to github workflow, completely

Tue Oct 22 11:55:05 UTC 2019

Le mardi 22 octobre 2019 à 13:17 +0530, Amar Tumballi a écrit :
> Thanks for the email Misc. My reasons inline.
> 
> On Mon, Oct 21, 2019 at 4:44 PM Michael Scherer <mscherer at redhat.com>
> wrote:
> 
> > Le lundi 14 octobre 2019 à 20:30 +0530, Amar Tumballi a écrit :
> > > On Mon, 14 Oct, 2019, 5:37 PM Niels de Vos, <ndevos at redhat.com>
> > > wrote:
> > > 
> > > > On Mon, Oct 14, 2019 at 03:52:30PM +0530, Amar Tumballi wrote:
> > > > > Any thoughts on this?
> > > > > 
> > > > > I tried a basic .travis.yml for the unified glusterfs repo I
> > > > > am
> > > > > maintaining, and it is good enough for getting most of the
> > > > > tests.
> > > > > Considering we are very close to glusterfs-7.0 release, it is
> > > > > good to
> > > > 
> > > > time
> > > > > this after 7.0 release.
> > > > 
> > > > Is there a reason to move to Travis? GitHub does offer
> > > > integration
> > > > with
> > > > Jenkins, so we should be able to keep using our existing CI, I
> > > > think?
> > > > 
> > > 
> > > Yes, that's true. I tried Travis because I don't have complete
> > > idea
> > > of
> > > Jenkins infra and trying Travis needed just basic permissions
> > > from me
> > > on
> > > repo (it was tried on my personal repo)
> > 
> > Travis is limited to 1 builder per project with the free version..
> > So since the regression test last 4h, I am not sure exactly what is
> > the
> > plan there.
> > 
> > 
> 
> We can't regress from our current testing coverage when we migrate.
> So, My take is, we should start with surely using existing Jenkins
> itself from github. And eventually see if there are any better
> options, or else at least remain with this CI.

Ansible did use travis first, and we did had a lot of issues after a
while. I think that if we want to have the same amount of CI than now,
we would need around 5 to 6 VM at minima for a average workload, and a
bit more for release time. Assuming we start to have more activity
(and/or more coverage), we would need to scale up to more VM too.

Plus, given glusterfs nature, I am not sure we should use a Ci where we
do not control the underlying kernel or infra. We still have some
issues with I/O on AWS that break test, so I can't imagine how it would
be with a random kernel :/

> > Now, on the whole migration stuff, I do have a few questions:
> > 
> > - what will happen to the history of the project (aka, the old
> > review.gluster.org server). I would be in favor of dropping it if
> > we
> > move out, but then, we would lose all informations there (the
> > review
> > content itself).
> > 
> > 
> 
> I would like to see it hosted somewhere (ie, in same URL preferably).
> But depending on sponsorship for the hosting charges, if we had to
> decide to shutting the service down, my take is, we can make the DB
> content made available for public download. Happy to provide a 'how
> to view patches' guide so one can setup Gerrit locally and see the
> details.

Wouldn't it be a problem to drop emails, etc in a DB like this ? RGPD
compliance come to mind. (I do not think that's a problem, but I would
really prefer to have a lawyer opinion first, since I would be legally
responsible in my country based on my memories of law courses in
university).

Plus, let's be honest with ourself, nobody is going to go to the hassle
of setting a old version of gerrit just to read a review. 

We could try to do a html mirror using wget, but I have no idea how
long it would take, nor how complete it would be in practice.

And yes, same URL is doable.

Then what about other gerrit using projects, shouldn't they be moved
before glusterfs ?

> - what happen to existing proposed patches, do they need to be
> migrated
> > one by one (and if so, who is going to script that part)
> > 
> > 
> 
> I checked that we have < 50 patches active on master branch, and
> other than Yaniv, no one has more than 5 patches active in review
> queue. So, I propose people can take up their own patches and post it
> to GitHub. For those who are not willing to do that extra work, or
> not active in project now,  I am happy to help them migrate the patch
> to PR.

How hard would it be to have github/gerrit side by side for a time ?
I am wary of any huge move and prefer incremental steps.

> 
> 
> > - can we, while we are on it, force 2FA for the whole org on github
> > ?
> > before, I didn't push too hard because this wasn't critical, but if
> > there is a migration, that would be much more important.
> > 
> > 
> 
> Yes. I believe that is totally fine, specifically for those who are
> admins of the org, and those who can merge.

I think we can't force per team, just the whole org (I may be wrong, it
happen, but not that often).

> 
> > - what is the plan to force to enforce the various policies ?
> > (like the fact that commit need to be sign, in a DCO like fashion,
> > or
> > to decide who can merge, who can give +2, and how we trigger build
> > only
> > when someone has said "this is verified")
> > 
> > 
> 
> About people, two options IMO:
> 1. Provide access to same set of people who have access in Gerrit.
> or 2. Look at the activity list in last 1 year, and see who has
> actually
> reviewed AND merged any patch from the above list to have access.
> 
> About policies on how to trigger build, and merge I prefer to use
> tools like mergify.io which is also used by many open source
> projects, and also friends @ Ceph project use the same. That way,
> there would be no human pressing merge, but policy based patches
> would be merged.
> 
> About what strings, commands to use for triggering builds (/run
> smoke, /run regression etc), I am happy to work with someone to get
> this done.

It kinda mean that we either need to write a bot (and so host it, but
no big deal), or do it using github actions. 

I didn't look at the second, but doing it outside of any source control
is not great IMHO. But in all cases, I would like to get a working
solution before a move.

> > - can we define also some goals on why to migrate ?
> > 
> 
> Sure, will list below.
> 
> 
> > the thread do not really explain why, except "that's what everybody
> > is
> > doing". Based on previous migrations for different contexts, that's
> > usually not sufficient, and we get the exact same amount of
> > contribution no matter what (like static blog vs wordpress vs
> > static
> > blog), except that someone (usually me) has to do lots of work.
> > 
> > 
> 
> I agree, and sorry about causing lot of work for you :-/ None of this
> intentional. We all thrive and look for better way as they (and we)
> evolve.
> It is good to recheck whether we are using right tools, right
> processes or
> not every 2 yrs at least.

I am ok to do work, I am paid for that. Just that if we do work, it
seems important to make sure we do for the right reasons, and the
tentation to change tools is usually strong when the problem is
partially elsewhere.

Now the argument of future proofing the infra in case sponsorship stop
is IMHO pretty important, and i would be glad to help for that.

> > So could someone give some estimate that can be measured on what is
> > going to be improved, along a timeframe for the estimated
> > improvement ?
> > (so like in 6 months, this will be bring new developpers, or this
> > will
> > make patch being merged 10% faster). And just to be clear, if the
> > goals
> > are not reached in the timeframe, I am gonna make people
> > accountable
> > and complain next time someone propose a migration.
> > 
> > 
> 
> This part of the email is very critical, for everyone. Because if we
> don't
> measure it, we didn't achieve anything.
> 
> The '*1. everyone uses it, so lets use it*' reason is critical, and
> not the only one. Let me add some reasoning on this topic before
> listing other perks I can think of.
>      If you look at last 6 months of 'patch' contributions which
> happened directly from developers outside of Red Hat, it is ~5
> people, for the total
> of 11 patches. In the meantime there were 340+ patches posted by
> people belonging to Red Hat. I posted few patches belonging to
> different users,
> who had posted the patches in bugzilla, which counts as a reason too
> IMO. (For example: https://review.gluster.org/22678/
> <https://review.gluster.org/#/c/glusterfs/+/22678/>)
> 
> *2. Metrics / Insights: *
>     For success of an opensource project, its popularity also is a
> big factor, and it can only be seen with some sort of analytics, and
> metrics.
> GitHub provides native insights of the project, and would be very
> helpful in seeing ourselves, and showcasing the activities in public.
>     This particular reason works beyond just our own repo, but helps
> by featuring in many promotional emails and data published by github
> (or even other analysis people). For example GlusterFS's activity in
> 2018 was higher than many projects featured in "Top 100 active open
> source projects of 2018" list published, but because we were running
> all our show in isolation, we didn't get considered.

> *3. Alternative options for users to reach developers.*
>     Today, there are 2 ways to reach developers or raise issues for a
> community user.
>      1. Report an issue in bugzilla (for a new user it is registering
> into one more tool),
>      2. Send an email to gluster-devel.
>      The way github is structured, the repository itself can work as
> forum, where users can raise issues, and track progress by
> 'watching'. And as they can also submit patches in the same system,
> it will be all inclusive community.

So, does that mean that bugzilla as a way to report bugs would also be
dropped ?

Github issues are a bit primitive when it come to filtering, and do not
provides support for private issue, which make it unsuitable for
downstream (which is fine for me, we are speaking of upstream now) and
for security sensitive issue. 

In turn, this mean we will have to do script to clean the list of issue
and close old stuff. Nothing that can't be done, but that in turn would
requires some format for issues and label fiddling. 

Now we can have github and bugzilla, but that seems confusing IMHO.

> *4. Review Suggestions*
>    For any new developers, coming to Gerrit, they wouldn't know whom
> to add as reviewer for the patch. This caused lot of delay in
> merging/reviewing patches, because developers complained "I was not
> added as the reviewer".
> How would an external contributor know who-is-who? GitHub, while
> opening the PR, depending on the history of contributions, and
> reviews, suggest reviewers, so Reviewers can add at least one of the
> maintainer, or peer as reviewer. This would fasten up the review
> process.

While I can see how that would be a improvement, I am not sure people
do use that. I see that for example, Rust is doing this with a bot,
because this provides more control than the heuristic from github.

I am not sure that github take in account the workload of people or
anything.

Something that could be done to ease the move would be to first write a
bot (one that would be working on github and gerrit) and once we
migrated the workflow to the bot, then we can move it from gerrit to
github.

This permit to be incremental in the move, or even test it while having
somehow some stuff on github and some on gerrit. 

> *5. Tools and expanding eco-system.*
>    I see that there are many tools which are being developed around
> github
> workflow to make developer's life simple, and we can use them to make
> ourselves efficient. Fog example, https://mergify.io etc
> 
> 
> Goals of the migration (IMO):
> 1. We can measure month to month growth and unique visitor metrics.
> 2. Measure outside Red Hat contribution, and see if it is going up.
> 3. Check if this makes more activity in github issues etc.
-- 
Michael Scherer / He/Il/Er/Él
Sysadmin, Community Infrastructure

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20191022/be6cd42e/attachment-0001.sig>