From bugzilla at redhat.com Sat Jun 1 21:01:10 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Sat, 01 Jun 2019 21:01:10 +0000
Subject: [Gluster-infra] [Bug 1716097] New: infra: create
suse-packing@lists.nfs-ganesha.org alias
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1716097
Bug ID: 1716097
Summary: infra: create suse-packing at lists.nfs-ganesha.org alias
Product: GlusterFS
Version: mainline
Status: NEW
Component: project-infrastructure
Assignee: bugs at gluster.org
Reporter: kkeithle at redhat.com
CC: bugs at gluster.org, gluster-infra at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
Is there an OSAS ticketing system to use instead of this?
Anyway, forwarded to me.
Thanks
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
From ykaul at redhat.com Tue Jun 4 13:27:04 2019
From: ykaul at redhat.com (Yaniv Kaul)
Date: Tue, 4 Jun 2019 16:27:04 +0300
Subject: [Gluster-infra] [Gluster-devel] rebal-all-nodes-migrate.t
always fails now
In-Reply-To: <090785225412c2b5b269454f8812d0a165aea62d.camel@redhat.com>
References:
<94bd8147c5035da76c3ac3ae90a8a02ed000106a.camel@redhat.com>
<0ca34e42063ad77f323155c85a7bb3ba7a79931b.camel@redhat.com>
<090785225412c2b5b269454f8812d0a165aea62d.camel@redhat.com>
Message-ID:
What was the result of this investigation? I suspect seeing the same issue
on builder209[1].
Y.
[1] https://build.gluster.org/job/centos7-regression/6302/consoleFull
On Fri, Apr 5, 2019 at 5:40 PM Michael Scherer wrote:
> Le vendredi 05 avril 2019 ? 16:55 +0530, Nithya Balachandran a ?crit :
> > On Fri, 5 Apr 2019 at 12:16, Michael Scherer
> > wrote:
> >
> > > Le jeudi 04 avril 2019 ? 18:24 +0200, Michael Scherer a ?crit :
> > > > Le jeudi 04 avril 2019 ? 19:10 +0300, Yaniv Kaul a ?crit :
> > > > > I'm not convinced this is solved. Just had what I believe is a
> > > > > similar
> > > > > failure:
> > > > >
> > > > > *00:12:02.532* A dependency job for rpc-statd.service failed.
> > > > > See
> > > > > 'journalctl -xe' for details.*00:12:02.532* mount.nfs:
> > > > > rpc.statd is
> > > > > not running but is required for remote locking.*00:12:02.532*
> > > > > mount.nfs: Either use '-o nolock' to keep locks local, or start
> > > > > statd.*00:12:02.532* mount.nfs: an incorrect mount option was
> > > > > specified
> > > > >
> > > > > (of course, it can always be my patch!)
> > > > >
> > > > > https://build.gluster.org/job/centos7-regression/5384/console
> > > >
> > > > same issue, different builder (206). I will check them all, as
> > > > the
> > > > issue is more widespread than I expected (or it did popup since
> > > > last
> > > > time I checked).
> > >
> > > Deepshika did notice that the issue came back on one server
> > > (builder202) after a reboot, so the rpcbind issue is not related to
> > > the
> > > network initscript one, so the RCA continue.
> > >
> > > We are looking for another workaround involving fiddling with the
> > > socket (until we find why it do use ipv6 at boot, but not after,
> > > when
> > > ipv6 is disabled).
> > >
> >
> > Could this be relevant?
> > https://access.redhat.com/solutions/2798411
>
> Good catch.
>
> So, we already do that, Nigel took care of that (after 2 days of
> research). But I didn't knew the exact symptoms, and decided to double
> check just in case.
>
> And... there is no sysctl.conf in the initrd. Running dracut -v -f do
> not change anything.
>
> Running "dracut -v -f -H" take care of that (and this fix the problem),
> but:
> - our ansible script already run that
> - -H is hostonly, which is already the default on EL7 according to the
> doc.
>
> However, if dracut-config-generic is installed, it doesn't build a
> hostonly initrd, and so do not include the sysctl.conf file (who break
> rpcbnd, who break the test suite).
>
> And for some reason, it is installed the image in ec2 (likely default),
> but not by default on the builders.
>
> So what happen is that after a kernel upgrade, dracut rebuild a generic
> initrd instead of a hostonly one, who break things. And kernel was
> likely upgraded recently (and upgrade happen nightly (for some value of
> "night"), so we didn't see that earlier, nor with a fresh system.
>
>
> So now, we have several solution:
> - be explicit on using hostonly in dracut, so this doesn't happen again
> (or not for this reason)
>
> - disable ipv6 in rpcbind in a cleaner way (to be tested)
>
> - get the test suite work with ip v6
>
> In the long term, I also want to monitor the processes, but for that, I
> need a VPN between the nagios server and ec2, and that project got
> blocked by several issues (like EC2 not support ecdsa keys, and we use
> that for ansible, so we have to come back to RSA for full automated
> deployment, and openvon requires to use certificates, so I need a newer
> python openssl for doing what I want, and RHEL 7 is too old, etc, etc).
>
> As the weekend approach for me, I just rebuilt the initrd for the time
> being. I guess forcing hostonly is the safest fix for now, but this
> will be for monday.
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From dkhandel at redhat.com Wed Jun 5 06:57:21 2019
From: dkhandel at redhat.com (Deepshikha Khandelwal)
Date: Wed, 5 Jun 2019 12:27:21 +0530
Subject: [Gluster-infra] [Gluster-devel] rebal-all-nodes-migrate.t
always fails now
In-Reply-To:
References:
<94bd8147c5035da76c3ac3ae90a8a02ed000106a.camel@redhat.com>
<0ca34e42063ad77f323155c85a7bb3ba7a79931b.camel@redhat.com>
<090785225412c2b5b269454f8812d0a165aea62d.camel@redhat.com>
Message-ID:
I recently added 3 builders builder208, builder209, builder210 to the
regression pool. Network to these new builders did not come up because it
was looking for non-existing ethernet card eth0 on reboot and hence
failing. I'll reconnect them back and update here once I fix the issue
today.
Sorry for the inconvenience.
On Tue, Jun 4, 2019 at 7:07 PM Yaniv Kaul wrote:
> What was the result of this investigation? I suspect seeing the same issue
> on builder209[1].
> Y.
>
> [1] https://build.gluster.org/job/centos7-regression/6302/consoleFull
>
> On Fri, Apr 5, 2019 at 5:40 PM Michael Scherer
> wrote:
>
>> Le vendredi 05 avril 2019 ? 16:55 +0530, Nithya Balachandran a ?crit :
>> > On Fri, 5 Apr 2019 at 12:16, Michael Scherer
>> > wrote:
>> >
>> > > Le jeudi 04 avril 2019 ? 18:24 +0200, Michael Scherer a ?crit :
>> > > > Le jeudi 04 avril 2019 ? 19:10 +0300, Yaniv Kaul a ?crit :
>> > > > > I'm not convinced this is solved. Just had what I believe is a
>> > > > > similar
>> > > > > failure:
>> > > > >
>> > > > > *00:12:02.532* A dependency job for rpc-statd.service failed.
>> > > > > See
>> > > > > 'journalctl -xe' for details.*00:12:02.532* mount.nfs:
>> > > > > rpc.statd is
>> > > > > not running but is required for remote locking.*00:12:02.532*
>> > > > > mount.nfs: Either use '-o nolock' to keep locks local, or start
>> > > > > statd.*00:12:02.532* mount.nfs: an incorrect mount option was
>> > > > > specified
>> > > > >
>> > > > > (of course, it can always be my patch!)
>> > > > >
>> > > > > https://build.gluster.org/job/centos7-regression/5384/console
>> > > >
>> > > > same issue, different builder (206). I will check them all, as
>> > > > the
>> > > > issue is more widespread than I expected (or it did popup since
>> > > > last
>> > > > time I checked).
>> > >
>> > > Deepshika did notice that the issue came back on one server
>> > > (builder202) after a reboot, so the rpcbind issue is not related to
>> > > the
>> > > network initscript one, so the RCA continue.
>> > >
>> > > We are looking for another workaround involving fiddling with the
>> > > socket (until we find why it do use ipv6 at boot, but not after,
>> > > when
>> > > ipv6 is disabled).
>> > >
>> >
>> > Could this be relevant?
>> > https://access.redhat.com/solutions/2798411
>>
>> Good catch.
>>
>> So, we already do that, Nigel took care of that (after 2 days of
>> research). But I didn't knew the exact symptoms, and decided to double
>> check just in case.
>>
>> And... there is no sysctl.conf in the initrd. Running dracut -v -f do
>> not change anything.
>>
>> Running "dracut -v -f -H" take care of that (and this fix the problem),
>> but:
>> - our ansible script already run that
>> - -H is hostonly, which is already the default on EL7 according to the
>> doc.
>>
>> However, if dracut-config-generic is installed, it doesn't build a
>> hostonly initrd, and so do not include the sysctl.conf file (who break
>> rpcbnd, who break the test suite).
>>
>> And for some reason, it is installed the image in ec2 (likely default),
>> but not by default on the builders.
>>
>> So what happen is that after a kernel upgrade, dracut rebuild a generic
>> initrd instead of a hostonly one, who break things. And kernel was
>> likely upgraded recently (and upgrade happen nightly (for some value of
>> "night"), so we didn't see that earlier, nor with a fresh system.
>>
>>
>> So now, we have several solution:
>> - be explicit on using hostonly in dracut, so this doesn't happen again
>> (or not for this reason)
>>
>> - disable ipv6 in rpcbind in a cleaner way (to be tested)
>>
>> - get the test suite work with ip v6
>>
>> In the long term, I also want to monitor the processes, but for that, I
>> need a VPN between the nagios server and ec2, and that project got
>> blocked by several issues (like EC2 not support ecdsa keys, and we use
>> that for ansible, so we have to come back to RSA for full automated
>> deployment, and openvon requires to use certificates, so I need a newer
>> python openssl for doing what I want, and RHEL 7 is too old, etc, etc).
>>
>> As the weekend approach for me, I just rebuilt the initrd for the time
>> being. I guess forcing hostonly is the safest fix for now, but this
>> will be for monday.
>> --
>> Michael Scherer
>> Sysadmin, Community Infrastructure and Platform, OSAS
>>
>>
>> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From amukherj at redhat.com Tue Jun 11 08:51:35 2019
From: amukherj at redhat.com (Atin Mukherjee)
Date: Tue, 11 Jun 2019 14:21:35 +0530
Subject: [Gluster-infra]
https://build.gluster.org/job/centos7-regression/6404/consoleFull - Problem
accessing //job/centos7-regression/6404/consoleFull. Reason: Not found
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719174
The patch which failed the regression is https://review.gluster.org/22851 .
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla at redhat.com Tue Jun 11 08:50:43 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Tue, 11 Jun 2019 08:50:43 +0000
Subject: [Gluster-infra] [Bug 1719174] New: broken regression link?
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719174
Bug ID: 1719174
Summary: broken regression link?
Product: GlusterFS
Version: mainline
Status: NEW
Component: project-infrastructure
Assignee: bugs at gluster.org
Reporter: amukherj at redhat.com
CC: bugs at gluster.org, gluster-infra at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
Regression job in one of my patches failed at
https://build.gluster.org/job/centos7-regression/6404/consoleFull with no
indication of what test failed. While accessing the link following pops up:
Problem accessing //job/centos7-regression/6404/consoleFull. Reason: Not found
Version-Release number of selected component (if applicable):
mainline
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Tue Jun 11 16:40:46 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Tue, 11 Jun 2019 16:40:46 +0000
Subject: [Gluster-infra] [Bug 1719388] New: infra: download.gluster.org
/var/www/html/... is out of free space
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719388
Bug ID: 1719388
Summary: infra: download.gluster.org /var/www/html/... is out
of free space
Product: GlusterFS
Version: mainline
Status: NEW
Component: project-infrastructure
Assignee: bugs at gluster.org
Reporter: kkeithle at redhat.com
CC: bugs at gluster.org, gluster-infra at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Wed Jun 12 10:54:39 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Wed, 12 Jun 2019 10:54:39 +0000
Subject: [Gluster-infra] [Bug 1489325] Place to host gerritstats
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1489325
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(mscherer at redhat.c |
|om) |
--- Comment #5 from M. Scherer ---
I would prefer it in the cage, on a separate VM (I already started the
playbook). As for the stats and end results, that should be to the reporter,
not me.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Wed Jun 12 10:57:36 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Wed, 12 Jun 2019 10:57:36 +0000
Subject: [Gluster-infra] [Bug 1713391] Access to wordpress instance of
gluster.org required for release management
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1713391
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Comment #0 is|1 |0
private| |
Flags|needinfo?(mscherer at redhat.c |needinfo?(dkhandel at redhat.c
|om) |om)
--- Comment #3 from M. Scherer ---
Deepshika, what kind of info is needed ?
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Wed Jun 12 11:02:21 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Wed, 12 Jun 2019 11:02:21 +0000
Subject: [Gluster-infra] [Bug 1504713] Move planet build to be triggered by
Jenkins
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1504713
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(mscherer at redhat.c |
|om) |
--- Comment #3 from M. Scherer ---
Nope, nothing changed. That's kinda a lower priority, since the system work
well enough most of the time, and has a rather low impact.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Wed Jun 12 11:38:34 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Wed, 12 Jun 2019 11:38:34 +0000
Subject: [Gluster-infra] [Bug 1713391] Access to wordpress instance of
gluster.org required for release management
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1713391
Deepshikha khandelwal changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(dkhandel at redhat.c |
|om) |
--- Comment #4 from Deepshikha khandelwal ---
Misc, I've no idea on how can I give access to wordpress instance. It would be
great if there is any documentation on this.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Wed Jun 12 12:09:44 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Wed, 12 Jun 2019 12:09:44 +0000
Subject: [Gluster-infra] [Bug 1711950] Account in download.gluster.org to
upload the build packages
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1711950
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |needinfo?(sacharya at redhat.c
| |om)
--- Comment #2 from M. Scherer ---
Ok so before opening a account, I would like to discuss the plan for automating
that.
I kinda feel unease of the fact we are still doing everything manually
(especially after the nfs ganesha issue that we found internally), and while I
do not have personnaly the ressources nor time to automate (was on TODO list,
but after Nigel departure and the migration to AWS, this was pushed down the
line), I would like to take on this opportunity to first discuss that, and then
open the account.
In that order, because experience show that the reverse order is not
consecutive of any action (curiously, folks listen to me more when they wait on
me for something, so I hope folks will excuse me for that obvious blackmail,
but ot should be quick).
So, how long would it take to automate the release from Jenkins to
download.gluster, and who would be dedicated on it on the gluster side ?
(once we agree on a deadline, I will create a account that expire automatically
after that time, just to make sure we do not leave a gapping hole open)
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Wed Jun 12 12:16:11 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Wed, 12 Jun 2019 12:16:11 +0000
Subject: [Gluster-infra] [Bug 1348072] Backups for Gerrit
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1348072
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(mscherer at redhat.c |
|om) |
--- Comment #6 from M. Scherer ---
I need to review the current status, I know we have database backups, but
backups is one half of the coin, we also need to test the recovery from end to
end. And testing the recovery requires to be able to automate the installation
of Gerrit, something that was cautiously delayed due to the criticity of the
service (aka, that's still not managed by ansible).
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Wed Jun 12 12:17:18 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Wed, 12 Jun 2019 12:17:18 +0000
Subject: [Gluster-infra] [Bug 1489417] Gerrit shouldn't offer http or git
for code download
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1489417
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(mscherer at redhat.c |
|om) |
--- Comment #5 from M. Scherer ---
Dunno, I think Nigel had a specific plan for this, but that's not on my radar.
I would however keep it open so we do not forget, once more urgent stuff are
done (or once we get more ressources, who would have a side effect of fixing
more urgent stuff)
--
You are receiving this mail because:
You are on the CC list for the bug.
From atumball at redhat.com Wed Jun 12 12:32:58 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 12 Jun 2019 18:02:58 +0530
Subject: [Gluster-infra] New workflow proposal for glusterfs repo
Message-ID:
Few bullet points:
* Let smoke job sequentially for below, and if successful, in parallel for
others.
- Sequential:
-- clang-format check
-- compare-bugzilla-version-git-branch
-- bugzilla-post
-- comment-on-issue
-- fedora-smoke (mainly don't want warning).
- Parallel
-- all devrpm jobs
-- 32bit smoke
-- freebsd-smoke
-- smoke
-- strfmt_errors
-- python-lint, and shellcheck.
* Remove Verified flag. No point in one more extra button which users need
to click, anyways CentOS regression is considered as 'Verification'.
* In a normal flow, let CentOS regression which is running after 'Verified'
vote, be triggered on first 'successful' +1 reviewed vote.
* For those patches which got pushed to system to just 'validate' behavior,
to run sample tests, WIP patches, continue to support 'recheck centos'
comment message, so we can run without any vote. Let it not be the norm.
With this, I see that we can reduce smoke failures utilize 90% less
resources for a patch which would fail smoke anyways. (ie, 95% of the smoke
failures would be caught in first 10% of the resource, and time).
Also we can reduce number of regression running, as review is mandatory to
run regression.
These are just suggestions, happy to discuss more on these.
-Amar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From amukherj at redhat.com Wed Jun 12 17:55:11 2019
From: amukherj at redhat.com (Atin Mukherjee)
Date: Wed, 12 Jun 2019 23:25:11 +0530
Subject: [Gluster-infra] New workflow proposal for glusterfs repo
In-Reply-To:
References:
Message-ID:
On Wed, 12 Jun 2019 at 18:04, Amar Tumballi Suryanarayan <
atumball at redhat.com> wrote:
>
> Few bullet points:
>
> * Let smoke job sequentially for below, and if successful, in parallel for
> others.
> - Sequential:
> -- clang-format check
> -- compare-bugzilla-version-git-branch
> -- bugzilla-post
> -- comment-on-issue
> -- fedora-smoke (mainly don't want warning).
>
+1
- Parallel
> -- all devrpm jobs
> -- 32bit smoke
> -- freebsd-smoke
> -- smoke
> -- strfmt_errors
> -- python-lint, and shellcheck.
>
I?m sure there must be a reason but would like to know that why do they
need to be parallel? Can?t we have them sequentially to have similar
benefits of the resource utilisation like above? Or are all these
individual jobs are time consuming such that having them sequentially will
lead the overall smoke job to consume much longer?
> * Remove Verified flag. No point in one more extra button which users need
> to click, anyways CentOS regression is considered as 'Verification'.
>
> * In a normal flow, let CentOS regression which is running after
> 'Verified' vote, be triggered on first 'successful' +1 reviewed vote.
>
As I believe some reviewers/maintainers (including me) would like to see
the regression vote to put a +1/+2 in most of the patches until and unless
they are straight forward ones. So although with this you?re reducing the
burden of one extra click from the patch owner, but on the other way you?re
introducing the same burden on the reviewers who would like to check the
regression vote. IMHO, I don?t see much benefits in implementing this.
> * For those patches which got pushed to system to just 'validate'
> behavior, to run sample tests, WIP patches, continue to support 'recheck
> centos' comment message, so we can run without any vote. Let it not be the
> norm.
>
>
> With this, I see that we can reduce smoke failures utilize 90% less
> resources for a patch which would fail smoke anyways. (ie, 95% of the smoke
> failures would be caught in first 10% of the resource, and time).
>
> Also we can reduce number of regression running, as review is mandatory to
> run regression.
>
> These are just suggestions, happy to discuss more on these.
>
> -Amar
>
>
>
> _______________________________________________
> Gluster-infra mailing list
> Gluster-infra at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-infra
--
- Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From rtalur at redhat.com Wed Jun 12 18:28:37 2019
From: rtalur at redhat.com (Raghavendra Talur)
Date: Wed, 12 Jun 2019 14:28:37 -0400
Subject: [Gluster-infra] New workflow proposal for glusterfs repo
In-Reply-To:
References:
Message-ID:
On Wed, Jun 12, 2019, 1:56 PM Atin Mukherjee wrote:
>
>
> On Wed, 12 Jun 2019 at 18:04, Amar Tumballi Suryanarayan <
> atumball at redhat.com> wrote:
>
>>
>> Few bullet points:
>>
>> * Let smoke job sequentially for below, and if successful, in parallel
>> for others.
>> - Sequential:
>> -- clang-format check
>> -- compare-bugzilla-version-git-branch
>> -- bugzilla-post
>> -- comment-on-issue
>> -- fedora-smoke (mainly don't want warning).
>>
>
> +1
>
> - Parallel
>> -- all devrpm jobs
>> -- 32bit smoke
>> -- freebsd-smoke
>> -- smoke
>> -- strfmt_errors
>> -- python-lint, and shellcheck.
>>
>
> I?m sure there must be a reason but would like to know that why do they
> need to be parallel? Can?t we have them sequentially to have similar
> benefits of the resource utilisation like above? Or are all these
> individual jobs are time consuming such that having them sequentially will
> lead the overall smoke job to consume much longer?
>
>
>> * Remove Verified flag. No point in one more extra button which users
>> need to click, anyways CentOS regression is considered as 'Verification'.
>>
>
The requirement of verified flag by patch owner for regression to run was
added because the number of Jenkins machines we had were few and patches
being uploaded were many.
>> * In a normal flow, let CentOS regression which is running after
>> 'Verified' vote, be triggered on first 'successful' +1 reviewed vote.
>>
>
> As I believe some reviewers/maintainers (including me) would like to see
> the regression vote to put a +1/+2 in most of the patches until and unless
> they are straight forward ones. So although with this you?re reducing the
> burden of one extra click from the patch owner, but on the other way you?re
> introducing the same burden on the reviewers who would like to check the
> regression vote. IMHO, I don?t see much benefits in implementing this.
>
Agree with Atin here. Burden should be on machines before people. Reviewers
prefer to look at patches that have passed regression.
In github heketi, we have configured regression to run on all patches that
are submitted by heketi developer group. If such configuration is possible
in gerrit+Jenkins, we should definitely do it that way.
For patches that are submitted by someone outside of the developer group, a
maintainer should verify that the patch doesn't do anything harmful and
mark the regression to run.
Talur
>
>> * For those patches which got pushed to system to just 'validate'
>> behavior, to run sample tests, WIP patches, continue to support 'recheck
>> centos' comment message, so we can run without any vote. Let it not be the
>> norm.
>>
>>
>> With this, I see that we can reduce smoke failures utilize 90% less
>> resources for a patch which would fail smoke anyways. (ie, 95% of the smoke
>> failures would be caught in first 10% of the resource, and time).
>>
>> Also we can reduce number of regression running, as review is mandatory
>> to run regression.
>>
>> These are just suggestions, happy to discuss more on these.
>>
>> -Amar
>>
>>
>>
>> _______________________________________________
>> Gluster-infra mailing list
>> Gluster-infra at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-infra
>
> --
> - Atin (atinm)
> _______________________________________________
> Gluster-infra mailing list
> Gluster-infra at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-infra
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla at redhat.com Thu Jun 13 05:29:12 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 05:29:12 +0000
Subject: [Gluster-infra] [Bug 1716097] infra: create
suse-packing@lists.nfs-ganesha.org alias
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1716097
Marc Dequ?nes (Duck) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |duck at redhat.com
--- Comment #2 from Marc Dequ?nes (Duck) ---
Quack,
I created the list with Kaleb as owner. Now Kaleb can setup the list of admins
and moderators as well as the list description.
Please be careful to avoid manual modifications to the infra. I saw while
deploying that suse-packaging@ had be manually added to the aliases file for
Postfix. We do regular updates of the infra to fix and improve various things
and these changes are going to be overwritten and the expected feature lost.
\_o<
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 11:29:02 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 11:29:02 +0000
Subject: [Gluster-infra] [Bug 1711950] Account in download.gluster.org to
upload the build packages
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1711950
Shwetha K Acharya changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(sacharya at redhat.c |needinfo?(mscherer at redhat.c
|om) |om)
--- Comment #3 from Shwetha K Acharya ---
Hi Misc,
We have built the debian packages for glusterfs 6.2, and waiting for the
creation of accounts to upload the packages. Here,
https://github.com/gluster/glusterfs/issues/683 is a github issue, asking
reasons for delay on the same. It will be helpful if we are unblocked by this
soon. Can you do the needful?
About automating the procedure, I will initiate a discussion with the team, and
get back to you.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 11:58:51 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 11:58:51 +0000
Subject: [Gluster-infra] [Bug 1711950] Account in download.gluster.org to
upload the build packages
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1711950
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(mscherer at redhat.c |
|om) |
--- Comment #4 from M. Scherer ---
Sure give me a deadline, and I will create the account. I mean, I do not even
need a precise one.
Would you agree on "We do in 3 months", in which case I create the account
right now (with expiration as set).
(I need a public ssh key and a username)
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 13:08:57 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 13:08:57 +0000
Subject: [Gluster-infra] [Bug 1711950] Account in download.gluster.org to
upload the build packages
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1711950
Kaleb KEITHLEY changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sankarshan at redhat.com
Flags| |needinfo?(sankarshan at redhat
| |.com)
--- Comment #5 from Kaleb KEITHLEY ---
(In reply to M. Scherer from comment #2)
> Ok so before opening a account, I would like to discuss the plan for
> automating that.
> I kinda feel unease of the fact we are still doing everything manually
> (especially after the nfs ganesha issue that we found internally), and while
> I do not have personnaly the ressources nor time to automate (was on TODO
> list, but after Nigel departure and the migration to AWS, this was pushed
> down the line), I would like to take on this opportunity to first discuss
> that, and then open the account.
>
> In that order, because experience show that the reverse order is not
> consecutive of any action (curiously, folks listen to me more when they wait
> on me for something, so I hope folks will excuse me for that obvious
> blackmail, but ot should be quick).
>
> So, how long would it take to automate the release from Jenkins to
> download.gluster, and who would be dedicated on it on the gluster side ?
> (once we agree on a deadline, I will create a account that expire
> automatically after that time, just to make sure we do not leave a gapping
> hole open)
You, NIgel, and I had a discussion in Berlin over two years ago about this and
Nigel was supposed to automate it in Jenkins.
Someone like Sankarshan will have to identify a resource for doing the work
now.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 13:17:00 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 13:17:00 +0000
Subject: [Gluster-infra] [Bug 1711950] Account in download.gluster.org to
upload the build packages
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1711950
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags|needinfo?(sankarshan at redhat |
|.com) |
--- Comment #6 from M. Scherer ---
Yup, but clearly, as long as someone was doing the job manually, this was set
as a lesser priority than a lot of things (like fixing the fires all over the
place). The increasing backlog of tasks do not make me think we can do it
without someone taking ownership of that, and as you rightfully point, that's
something we all want since more than 2 years :/
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 14:06:59 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 14:06:59 +0000
Subject: [Gluster-infra] [Bug 1719388] infra: download.gluster.org
/var/www/html/... is out of free space
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719388
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mscherer at redhat.com
--- Comment #1 from M. Scherer ---
Fixed, I have added 2G (only 5G free without more commands).
I will now see if there is some missing cleanup, and why I got no nagios alert
for that metrics :/
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 14:12:46 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 14:12:46 +0000
Subject: [Gluster-infra] [Bug 1719388] infra: download.gluster.org
/var/www/html/... is out of free space
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719388
--- Comment #2 from M. Scherer ---
So:
https://download.gluster.org/pub/gluster/glusterfs/nightly/sources/ is taking
8G, and seems unused and no longer up to date.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 15:21:31 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 15:21:31 +0000
Subject: [Gluster-infra] [Bug 1719388] infra: download.gluster.org
/var/www/html/... is out of free space
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719388
--- Comment #3 from M. Scherer ---
ok so not only is monitoring out (not sure why, it worked when deployed), but /
is full, because /var/log is taking 5G (why, dunno, I guess a lot of requests
or something), and no compression, this filled the server.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Thu Jun 13 15:30:43 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 13 Jun 2019 15:30:43 +0000
Subject: [Gluster-infra] [Bug 1719388] infra: download.gluster.org
/var/www/html/... is out of free space
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719388
--- Comment #4 from M. Scherer ---
So, that was a missing package (optional dep), breaking the acl on munin. Not
sure how to clear that cleanly, but alerting shoulw work. Now, to clean
stuff...
--
You are receiving this mail because:
You are on the CC list for the bug.
From ravishankar at redhat.com Fri Jun 14 04:22:24 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Fri, 14 Jun 2019 09:52:24 +0530
Subject: [Gluster-infra] review.gluster.org is not accessible.
Message-ID:
Hi,
I have raised https://bugzilla.redhat.com/show_bug.cgi?id=1720453. The
issue seems to be intermittent. I was not able to access it on both
firefox and chrome. Now chrome works but not firefox.
Regards,
Ravi
From dkhandel at redhat.com Fri Jun 14 06:45:40 2019
From: dkhandel at redhat.com (Deepshikha Khandelwal)
Date: Fri, 14 Jun 2019 12:15:40 +0530
Subject: [Gluster-infra] Gerrit is down
Message-ID:
Hello,
review.gluster.org is down since morning. We are looking into the issue.
Will update once it is back.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mscherer at redhat.com Fri Jun 14 08:10:42 2019
From: mscherer at redhat.com (Michael Scherer)
Date: Fri, 14 Jun 2019 10:10:42 +0200
Subject: [Gluster-infra] DNS issue on review.gluster.org, causing outage
Message-ID: <6177518854b017c8e7389f402bb61c11695677ee.camel@redhat.com>
Hi,
there is a ongoing issue regarding review.gluster.org, with some people
being directed to the wrong server.
A quick fix is to add:
8.43.85.171 review.gluster.org
to /etc/hosts (on Linux)
Adding a MX record yesterday (due to a RH IT request) do result into
the domain name having 2 IP address, one pointing to supercolony (the
MX), one to the gerrit server. That is neither the intention, nor what
is supposed to happen, so I kinda suspect that's a bug somewhere (or a
corner case, or me misreading the RFC).
Investigation is ongoing.
Review.gluster.org may still work for some people (like, it work for me
and still work for me), hence why it wasn't noticed while I tested,
apology for that.
--
Michael Scherer
Sysadmin, Community Infrastructure
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL:
From bugzilla at redhat.com Fri Jun 14 04:17:38 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 14 Jun 2019 04:17:38 +0000
Subject: [Gluster-infra] [Bug 1720453] New: Unable to access
review.gluster.org
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1720453
Bug ID: 1720453
Summary: Unable to access review.gluster.org
Product: GlusterFS
Version: mainline
Status: NEW
Component: project-infrastructure
Assignee: bugs at gluster.org
Reporter: ravishankar at redhat.com
CC: bugs at gluster.org, gluster-infra at gluster.org
Target Milestone: ---
Classification: Community
Created attachment 1580537
--> https://bugzilla.redhat.com/attachment.cgi?id=1580537&action=edit
browser screenshot
Description of problem:
Not able to access the website, see attached screenshot.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Fri Jun 14 08:22:12 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 14 Jun 2019 08:22:12 +0000
Subject: [Gluster-infra] [Bug 1720453] Unable to access review.gluster.org
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1720453
M. Scherer changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mscherer at redhat.com
--- Comment #2 from M. Scherer ---
yeah, there is a DNS issue. I am on it and I suspect that I found the exact
issue. Postmortem will explain if that fix, right now, I am waiting on DNS
propagation.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Fri Jun 14 08:33:33 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 14 Jun 2019 08:33:33 +0000
Subject: [Gluster-infra] [Bug 1720453] Unable to access review.gluster.org
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1720453
--- Comment #3 from M. Scherer ---
So, i think the root cause is fixed (at least from my perspective), so DNS
propagation should occurs quickly and fix it for others.
Writing the post mortme at the moment.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Fri Jun 14 08:56:21 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 14 Jun 2019 08:56:21 +0000
Subject: [Gluster-infra] [Bug 1489417] Gerrit shouldn't offer http or git
for code download
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1489417
Amar Tumballi changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|unspecified |low
CC| |atumball at redhat.com
Severity|unspecified |low
--
You are receiving this mail because:
You are on the CC list for the bug.
From mscherer at redhat.com Fri Jun 14 09:50:53 2019
From: mscherer at redhat.com (Michael Scherer)
Date: Fri, 14 Jun 2019 11:50:53 +0200
Subject: [Gluster-infra] DNS issue on review.gluster.org, causing outage
In-Reply-To: <6177518854b017c8e7389f402bb61c11695677ee.camel@redhat.com>
References: <6177518854b017c8e7389f402bb61c11695677ee.camel@redhat.com>
Message-ID:
Le vendredi 14 juin 2019 ? 10:10 +0200, Michael Scherer a ?crit :
> Hi,
>
> there is a ongoing issue regarding review.gluster.org, with some
> people
> being directed to the wrong server.
>
> A quick fix is to add:
> 8.43.85.171 review.gluster.org
>
> to /etc/hosts (on Linux)
>
>
>
> Adding a MX record yesterday (due to a RH IT request) do result into
> the domain name having 2 IP address, one pointing to supercolony (the
> MX), one to the gerrit server. That is neither the intention, nor
> what
> is supposed to happen, so I kinda suspect that's a bug somewhere (or
> a
> corner case, or me misreading the RFC).
>
> Investigation is ongoing.
>
> Review.gluster.org may still work for some people (like, it work for
> me
> and still work for me), hence why it wasn't noticed while I tested,
> apology for that.
Ok so the issue should now be fixed. See
https://bugzilla.redhat.com/show_bug.cgi?id=1720453
(sorry forgot to send the email about the fix, too focused on the post
mortem)
--
Michael Scherer
Sysadmin, Community Infrastructure
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL:
From mscherer at redhat.com Fri Jun 14 09:55:10 2019
From: mscherer at redhat.com (Michael Scherer)
Date: Fri, 14 Jun 2019 11:55:10 +0200
Subject: [Gluster-infra] Post mortem for review.gluster.org DNS outage
Message-ID: <8060171842b824590e0e6d6a9fe8cfe5e376f7e3.camel@redhat.com>
Date: 14 June 2019
Participating:
- misc
- deepshika
Summary:
--------
People started to report http issue on review.gluster.org, while our
monitoring was silent (monitoring keep spamming me during the night
about download server being almost full following 1719388, so I know it
was working). A quick investigation show this was due to the DNS record
to be returning 2 entries, which result into round robin between the
wrong server and the right one.
Timeline (in UTC):
-------------------
- 2019-05-08: misc go on vacation
- 2019-05-24: RH IT contact misc (and others), saying that mails with a
return address of "review at review.gluster.org" clog their smtp servers
queue. Folks receive the mails, the server say "this is no longer
someone working here", try to sent back, this doesn't work, it fill the
queue of the MX. As a few people left RH in the last 6 months, and some
getting likely all notifications, this did create problem for them.
Postfix is heavily I/O bound (all communication between the dozens of
daemons are done using queue on disk, synced for reliability), filling
queue result into impacting a lot the operation of a MX, slowing it
down, resulting in bigger queues, etc.
- 2019-05-27: Deepshika and Duck try to fix this, not understanding why
the email is not working, or how it was supposed to work (spoiler, it
was never supposed to work). Conclude by "too weird, we need to wait".
- 2019-06-12: misc is back from vacation, see his mails, prioritize
them and explain that the review at review.gluster.org wasn't supposed to
be working, hence why people found nothing, and this was just the
default setting of gerrit.
- 2019-06-13: misc decide to setup a MX for the review.gluster.org
domain to drop all incoming emails, solving the bounce issue for IT.
See ansible repo commits for how that's done.
- 16:57 misc add a MX record for review.gluster.org to point to
supercolony IP address, after adding the code to route the whole domain
to /dev/null, then wait a bit to see nothing broke and go home
(assuming monitoring would scream during the evening if anything
happen). The diff for the DNS change is show later[1].
- 23:00 seeing monitoring didn't scream, misc decide to go to bed and
sleep.
- 2019-06-14: folks start to report outage as India folks start their
day
- 04:17: bug 1720453 is opened
- 05:31: Deepshika correctly diagnose the DNS issue, see that is was
related to last change, and try to contact misc on telegram
- 07:50: misc wake up, see his phone blinking, answer to the messages
- 08:10: misc check various things, reach the same conclusion as
deepshika, propose a workaround
- 08:13: after squinting hard at the diff, misc finally find
something that could be the cause
- 08:14: a commit is pushed (again, see the end)
- 08:15: DNS record is verified, and seems to be fixed
- 08:50: coffee is poured in a mug in misc's flat, and that port
mortem is redacted
Impact:
-------
review.gluster.org was randomly reachable for some people for a few
hours. I suspect the cage wasn't affected due to DNS cache, but some
jobs might have been affected.
The gluster.org top domain might have been impacted too, but I am not
sure how (MX was in place, DNS too, and we do not use direct
gluster.org anywhere, plus, I think there is some fallback and cache),
and nobody did report anything (and the monitoring also didn't scream).
Root cause:
-----------
The DNS entry was wrong, it did return 2 IP addresses while it should
have been a single one. But the exact behavior was (IMHO) quite subtle,
as people will see now.
The initial DNS diff was this:
--- a/prod/external-default/gluster.org
+++ b/prod/external-default/gluster.org
@@ -1,6 +1,6 @@
$TTL 300
@ IN SOA ns1.redhat.com. noc.redhat.com. (
- 2019040301 ; Serial
+ 2019061301 ; Serial
3600 ; Refresh
1800 ; Retry
604800 ; Expire
@@ -12,6 +12,7 @@ $TTL 300
IN NS ns3.redhat.com.
;
IN MX 10 mx2.gluster.org.
+review IN MX 10 mx2.gluster.org.
;build IN MX 10 mx1.gluster.org.
@@ -34,7 +35,6 @@ lists IN CNAME supercolony.rht
git IN CNAME gerrit.rht
patches IN CNAME gerrit.rht
-review IN CNAME gerrit.rht
gerrit IN CNAME gerrit.rht
gerrit-new.rht IN CNAME gerrit.rht
@@ -60,6 +60,8 @@ _kerberos-master._udp SRV 0 0 88
freeipa.gluster.org.
_kerberos-master._tcp SRV 0 0 88 freeipa.gluster.org.
postgresql.rht IN A 8.43.85.170
+review IN A 8.43.85.171
gerrit.rht IN A 8.43.85.171
; testVM for the switch to nftable
chrono.rht IN A 8.43.85.172
At a first look, any sysadmin will likely say this seems correct,
converting review to a A record (cause MX and CNAME can't coexist, I
couldn't push that due to zone syntax check on commit), adding a MX
record.
I assume that the reader do not see what is wrong with this one (not
more than me when I wrote it yesterday, and did check my change), and
to be fair, what is wrong is not visible in the diff.
The fix was this (edited for readability):
--- a/prod/external-default/gluster.org
+++ b/prod/external-default/gluster.org
@@ -1,6 +1,6 @@
$TTL 300
@ IN SOA ns1.redhat.com. noc.redhat.com. (
- 2019061301 ; Serial
+ 2019061401 ; Serial
3600 ; Refresh
1800 ; Retry
604800 ; Expire
@@ -10,18 +10,19 @@ $TTL 300
IN NS ns1.redhat.com.
IN NS ns2.redhat.com.
IN NS ns3.redhat.com.
+ IN A 8.43.85.176
;
IN MX 10 mx2.gluster.org.
review IN MX 10 mx2.gluster.org.
- IN A 8.43.85.176
; RH DC
mx2 IN A 8.43.85.176
Turn out that contrary to what I did believe, the zone file format is
not a format where each line is fully separate, and where order do not
matter (there is $ORIGIN, etc).
When you add a entry and give no name in a record (first word on the
line), it doesn't use the domain name (that's the role of "@" or
$ORIGIN), but it inherit the previous one (see
https://en.wikipedia.org/wiki/Zone_file). So far, this did result in
the same effect for gluster.org zone file, because every record without
a explicit name (the first field) was at the start, and the first
record is the domain name.
But it all changed once I added the MX.
Cause this went from (edited to remove space, comment, and make the
issue obvious and visible)
IN NS ns3.redhat.com.
IN MX 10 mx2.gluster.org.
IN A 8.43.85.176
mx2 IN A 8.43.85.176
to:
IN NS ns3.redhat.com.
IN
MX 10 mx2.gluster.org.
review IN MX 10
mx2.gluster.org.
IN A 8.43.85.176
mx2
IN A 8.43.85.176
Which, using that presentation and indentation, kinda hint that there
is a problem. I always thought that the indentation was mostly
cosmetic, and the format (unlike python) do not requires it. Turn there
is more.
The first commit placed the MX record at the wrong place, which changed
the meaning of the following line (the one that was out of the diff).
This did result in review.gluster.org having a 2nd A record (for
8.43.85.176, supercolony), stealing the one of the apex domain (or top
or naked domain).
That is the same exact issue as
https://lists.gluster.org/pipermail/gluster-infra/2018-August/004905.html
(DNS one). Except that back then, I never found the problem.
Resolution:
------------
- DNS got fixed
What went well:
---------------
- not much, I was just lucky to find the issue. It was the 2nd time I
looked, and last time, I didn't found. I guess what went well is that
it didn't went worst.
When we were lucky:
-------------------
- I didn't overslept too late, and wasn't more jetlagged from
vacation[2]
- the issue was found quickly, which is close to a miracle given I just
woke up, and I didn't found back in august 2018.
What went bad:
--------------
- monitoring didn't alert of anything. Given DNS propagation, it should
have alerted me during the evening if something happened, or so did I
think so.
- DNS automated verification didn't pick that, because that was valid.
- manual verification didn't yield a error. Not sure why this did work
from my side of the world every time.
To do:
------
- contact the Holy Seer to get that certified as "miracle". I am not a
morning person.
- try to understand why monitoring failed to see something failed.
- now that we fixed the issue, go back to the change in August that
cause the issue the first time and apply again (routing
build.gluster.org to /dev/null). That work was on going yesterday
already, not pushed because it was late.
Notes
-----
[1] yes, this post mortem follow the Chekhov's gun principle.
[2] yes, that's not much for a lucky perspective. But I did manage to
sleep around 16h after taking the plane last week, it took me a while
to adjust.
--
Michael Scherer
Sysadmin, Community Infrastructure
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL:
From atumball at redhat.com Fri Jun 14 11:46:00 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Fri, 14 Jun 2019 17:16:00 +0530
Subject: [Gluster-infra] Seems like Smoke job is not voting
Message-ID:
I see patches starting from 10:45 AM IST (7hrs before) are not getting
smoke votes.
For one of my patch, the smoke job is not triggered at all IMO.
https://review.gluster.org/#/c/22863/
Would be good to check it.
Regards,
Amar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From atumball at redhat.com Fri Jun 14 11:50:46 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Fri, 14 Jun 2019 17:20:46 +0530
Subject: [Gluster-infra] Seems like Smoke job is not voting
In-Reply-To:
References:
Message-ID:
Ok, guessed the possible cause.
The same possible DNS issue with review.gluster.org could have prevented
the patch fetching in smoke, and hence would have not triggered the job.
Those of you who have patches not getting a smoke, please run 'recheck
smoke' through comment.
-Amar
On Fri, Jun 14, 2019 at 5:16 PM Amar Tumballi Suryanarayan <
atumball at redhat.com> wrote:
> I see patches starting from 10:45 AM IST (7hrs before) are not getting
> smoke votes.
>
> For one of my patch, the smoke job is not triggered at all IMO.
>
> https://review.gluster.org/#/c/22863/
>
> Would be good to check it.
>
> Regards,
> Amar
>
>
--
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla at redhat.com Fri Jun 14 17:25:05 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 14 Jun 2019 17:25:05 +0000
Subject: [Gluster-infra] [Bug 1719388] infra: download.gluster.org
/var/www/html/... is out of free space
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1719388
--- Comment #5 from M. Scherer ---
so, / was full, i cleand things (especially since that's now behind the proxy,
no need to keep log there). Should be compressed in the future.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Tue Jun 18 02:49:46 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Tue, 18 Jun 2019 02:49:46 +0000
Subject: [Gluster-infra] [Bug 1716097] infra: create
suse-packing@lists.nfs-ganesha.org alias
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1716097
--- Comment #3 from Marc Dequ?nes (Duck) ---
Is this ok?
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Tue Jun 18 04:49:37 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Tue, 18 Jun 2019 04:49:37 +0000
Subject: [Gluster-infra] [Bug 1721353] New: Run 'line-coverage' regression
runs on a latest fedora machine (say fedora30).
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1721353
Bug ID: 1721353
Summary: Run 'line-coverage' regression runs on a latest fedora
machine (say fedora30).
Product: GlusterFS
Version: mainline
Status: NEW
Component: project-infrastructure
Severity: high
Priority: high
Assignee: bugs at gluster.org
Reporter: atumball at redhat.com
CC: bugs at gluster.org, gluster-infra at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
I suspect the code coverage tool with centos7 is not covering all the details.
Check
https://build.gluster.org/job/line-coverage/lastCompletedBuild/Line_20Coverage_20Report/libglusterfs/src/glusterfs/stack.h.gcov.html
for example, and you can see that it says, 17/41 functions are covered. But if
you notice there are only 17 inline functions, and all of them are actually
covered.
If it reported it properly, we should have had 100% coverage there.
With that detail, I hope having newer version would get this sorted.
Also note, we recently fixed all the issues with python3 in regression runs
too, so moving to fedora should help us identify issues sooner with python3 (if
any).
Version-Release number of selected component (if applicable):
master
How reproducible:
100%
Expected results:
Nightly line-coverage runs to run on fedora systems.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Tue Jun 18 06:23:14 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Tue, 18 Jun 2019 06:23:14 +0000
Subject: [Gluster-infra] [Bug 1721353] Run 'line-coverage' regression runs
on a latest fedora machine (say fedora30).
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1721353
--- Comment #1 from Amar Tumballi ---
Ok, when I used the lcov tool with the same commands as that of lcov.sh from
build-jobs repo, I got below numbers for stack.h (which I used as an example
above).
on current builder : stack.h - lines(262/497 - 52.7%), functions(17/41 -
41.5%)
on fedora29 (local): stack.h - lines(94/111 - 84.7%), functions(6/7 - 85.7%)
I hope just by running the regression on fedora, we would get more up-to-date
information, and more coverage details. Just note that I suspect this to be
more of an header file specific details, and even then, up-to-date information
is better than stale info.
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Tue Jun 18 07:31:30 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Tue, 18 Jun 2019 07:31:30 +0000
Subject: [Gluster-infra] [Bug 1721353] Run 'line-coverage' regression runs
on a latest fedora machine (say fedora30).
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1721353
Kotresh HR changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |khiremat at redhat.com
--- Comment #2 from Kotresh HR ---
I have also witnessed the small difference when I was working to improve lcov
of glusterd-georep.c. On fedora 30, I used to see 70.1 % but on centos 69.9 %.
I didn't spend time debugging that though. Didn't expect it was platform
dependent.
--
You are receiving this mail because:
You are on the CC list for the bug.
From mscherer at redhat.com Tue Jun 18 07:51:53 2019
From: mscherer at redhat.com (Michael Scherer)
Date: Tue, 18 Jun 2019 09:51:53 +0200
Subject: [Gluster-infra] Upgrade of some builder nodes to F30
Message-ID:
Hi,
as per request (and since F28 is EOL or soon to be), I will give a try
at adding a Fedora 30 node to the build cluster.
So if you see anything suspicious on the builder49 node once it is
back, please tell.
The plan is to add the node, then replay some job on it to see if they
are ok, then add a few more nodes, and switch the job for good, rince,
repeat.
For now, the system is blocked at step 1: "fix our playbook broken yet
again by a ansible upgrade".
--
Michael Scherer
Sysadmin, Community Infrastructure
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL:
From bugzilla at redhat.com Thu Jun 20 21:29:23 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Thu, 20 Jun 2019 21:29:23 +0000
Subject: [Gluster-infra] [Bug 1665361] Alerts for offline nodes
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1665361
PnT Account Manager changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|narekuma at redhat.com |bugs at gluster.org
--
You are receiving this mail because:
You are on the CC list for the bug.
From sankarshan.mukhopadhyay at gmail.com Mon Jun 24 11:53:08 2019
From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay)
Date: Mon, 24 Jun 2019 17:23:08 +0530
Subject: [Gluster-infra] New workflow proposal for glusterfs repo
In-Reply-To:
References:
Message-ID:
Checking back on this - do we need more voices or, amendments to
Amar's original proposal before we scope the implementation?
I read Amar's proposal as desiring an outcome where the journey of a
valid/good patch through the test flows is fast and efficient.
On Wed, Jun 12, 2019 at 11:58 PM Raghavendra Talur wrote:
>
>
>
> On Wed, Jun 12, 2019, 1:56 PM Atin Mukherjee wrote:
>>
>>
>>
>> On Wed, 12 Jun 2019 at 18:04, Amar Tumballi Suryanarayan wrote:
>>>
>>>
>>> Few bullet points:
>>>
>>> * Let smoke job sequentially for below, and if successful, in parallel for others.
>>> - Sequential:
>>> -- clang-format check
>>> -- compare-bugzilla-version-git-branch
>>> -- bugzilla-post
>>> -- comment-on-issue
>>> -- fedora-smoke (mainly don't want warning).
>>
>>
>> +1
>>
>>> - Parallel
>>> -- all devrpm jobs
>>> -- 32bit smoke
>>> -- freebsd-smoke
>>> -- smoke
>>> -- strfmt_errors
>>> -- python-lint, and shellcheck.
>>
>>
>> I?m sure there must be a reason but would like to know that why do they need to be parallel? Can?t we have them sequentially to have similar benefits of the resource utilisation like above? Or are all these individual jobs are time consuming such that having them sequentially will lead the overall smoke job to consume much longer?
>>
>>>
>>> * Remove Verified flag. No point in one more extra button which users need to click, anyways CentOS regression is considered as 'Verification'.
>
>
> The requirement of verified flag by patch owner for regression to run was added because the number of Jenkins machines we had were few and patches being uploaded were many.
However, do we consider that at present time the situation has
improved to consider the change Amar asks for?
>
>>>
>>> * In a normal flow, let CentOS regression which is running after 'Verified' vote, be triggered on first 'successful' +1 reviewed vote.
>>
>>
>> As I believe some reviewers/maintainers (including me) would like to see the regression vote to put a +1/+2 in most of the patches until and unless they are straight forward ones. So although with this you?re reducing the burden of one extra click from the patch owner, but on the other way you?re introducing the same burden on the reviewers who would like to check the regression vote. IMHO, I don?t see much benefits in implementing this.
>
>
> Agree with Atin here. Burden should be on machines before people. Reviewers prefer to look at patches that have passed regression.
>
> In github heketi, we have configured regression to run on all patches that are submitted by heketi developer group. If such configuration is possible in gerrit+Jenkins, we should definitely do it that way.
>
> For patches that are submitted by someone outside of the developer group, a maintainer should verify that the patch doesn't do anything harmful and mark the regression to run.
>
Deepshikha, is the above change feasible in the summation of Amar's proposal?
>>>
>>> * For those patches which got pushed to system to just 'validate' behavior, to run sample tests, WIP patches, continue to support 'recheck centos' comment message, so we can run without any vote. Let it not be the norm.
>>>
>>>
>>> With this, I see that we can reduce smoke failures utilize 90% less resources for a patch which would fail smoke anyways. (ie, 95% of the smoke failures would be caught in first 10% of the resource, and time).
>>>
>>> Also we can reduce number of regression running, as review is mandatory to run regression.
>>>
>>> These are just suggestions, happy to discuss more on these.
From dkhandel at redhat.com Tue Jun 25 05:24:55 2019
From: dkhandel at redhat.com (Deepshikha Khandelwal)
Date: Tue, 25 Jun 2019 10:54:55 +0530
Subject: [Gluster-infra] New workflow proposal for glusterfs repo
In-Reply-To:
References:
Message-ID:
On Mon, Jun 24, 2019 at 5:30 PM Sankarshan Mukhopadhyay <
sankarshan.mukhopadhyay at gmail.com> wrote:
> Checking back on this - do we need more voices or, amendments to
> Amar's original proposal before we scope the implementation?
>
> I read Amar's proposal as desiring an outcome where the journey of a
> valid/good patch through the test flows is fast and efficient.
>
> On Wed, Jun 12, 2019 at 11:58 PM Raghavendra Talur
> wrote:
> >
> >
> >
> > On Wed, Jun 12, 2019, 1:56 PM Atin Mukherjee
> wrote:
> >>
> >>
> >>
> >> On Wed, 12 Jun 2019 at 18:04, Amar Tumballi Suryanarayan <
> atumball at redhat.com> wrote:
> >>>
> >>>
> >>> Few bullet points:
> >>>
> >>> * Let smoke job sequentially for below, and if successful, in parallel
> for others.
> >>> - Sequential:
> >>> -- clang-format check
> >>> -- compare-bugzilla-version-git-branch
> >>> -- bugzilla-post
> >>> -- comment-on-issue
> >>> -- fedora-smoke (mainly don't want warning).
> >>
> >>
> >> +1
> >>
> >>> - Parallel
> >>> -- all devrpm jobs
> >>> -- 32bit smoke
> >>> -- freebsd-smoke
> >>> -- smoke
> >>> -- strfmt_errors
> >>> -- python-lint, and shellcheck.
> >>
> >>
> >> I?m sure there must be a reason but would like to know that why do they
> need to be parallel? Can?t we have them sequentially to have similar
> benefits of the resource utilisation like above? Or are all these
> individual jobs are time consuming such that having them sequentially will
> lead the overall smoke job to consume much longer?
> >>
> >>>
> >>> * Remove Verified flag. No point in one more extra button which users
> need to click, anyways CentOS regression is considered as 'Verification'.
> >
> >
> > The requirement of verified flag by patch owner for regression to run
> was added because the number of Jenkins machines we had were few and
> patches being uploaded were many.
>
> However, do we consider that at present time the situation has
> improved to consider the change Amar asks for?
>
> >
> >>>
> >>> * In a normal flow, let CentOS regression which is running after
> 'Verified' vote, be triggered on first 'successful' +1 reviewed vote.
> >>
> >>
> >> As I believe some reviewers/maintainers (including me) would like to
> see the regression vote to put a +1/+2 in most of the patches until and
> unless they are straight forward ones. So although with this you?re
> reducing the burden of one extra click from the patch owner, but on the
> other way you?re introducing the same burden on the reviewers who would
> like to check the regression vote. IMHO, I don?t see much benefits in
> implementing this.
> >
> >
> > Agree with Atin here. Burden should be on machines before people.
> Reviewers prefer to look at patches that have passed regression.
> >
> > In github heketi, we have configured regression to run on all patches
> that are submitted by heketi developer group. If such configuration is
> possible in gerrit+Jenkins, we should definitely do it that way.
> >
> > For patches that are submitted by someone outside of the developer
> group, a maintainer should verify that the patch doesn't do anything
> harmful and mark the regression to run.
> >
>
> Deepshikha, is the above change feasible in the summation of Amar's
> proposal?
>
Yes, I'm planning to implement the regression & flag related changes
initially if everyone agrees.
>
> >>>
> >>> * For those patches which got pushed to system to just 'validate'
> behavior, to run sample tests, WIP patches, continue to support 'recheck
> centos' comment message, so we can run without any vote. Let it not be the
> norm.
> >>>
> >>>
> >>> With this, I see that we can reduce smoke failures utilize 90% less
> resources for a patch which would fail smoke anyways. (ie, 95% of the smoke
> failures would be caught in first 10% of the resource, and time).
> >>>
> >>> Also we can reduce number of regression running, as review is
> mandatory to run regression.
> >>>
> >>> These are just suggestions, happy to discuss more on these.
> _______________________________________________
> Gluster-infra mailing list
> Gluster-infra at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-infra
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From sankarshan.mukhopadhyay at gmail.com Tue Jun 25 05:30:12 2019
From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay)
Date: Tue, 25 Jun 2019 11:00:12 +0530
Subject: [Gluster-infra] New workflow proposal for glusterfs repo
In-Reply-To:
References:
Message-ID:
Amar, can you bring about an agreement/decision on this so that we can
make progress?
On Tue, Jun 25, 2019 at 10:55 AM Deepshikha Khandelwal
wrote:
>
>
>
> On Mon, Jun 24, 2019 at 5:30 PM Sankarshan Mukhopadhyay wrote:
>>
>> Checking back on this - do we need more voices or, amendments to
>> Amar's original proposal before we scope the implementation?
>>
>> I read Amar's proposal as desiring an outcome where the journey of a
>> valid/good patch through the test flows is fast and efficient.
>>
>> On Wed, Jun 12, 2019 at 11:58 PM Raghavendra Talur wrote:
>> >
>> >
>> >
>> > On Wed, Jun 12, 2019, 1:56 PM Atin Mukherjee wrote:
>> >>
>> >>
>> >>
>> >> On Wed, 12 Jun 2019 at 18:04, Amar Tumballi Suryanarayan wrote:
>> >>>
>> >>>
>> >>> Few bullet points:
>> >>>
>> >>> * Let smoke job sequentially for below, and if successful, in parallel for others.
>> >>> - Sequential:
>> >>> -- clang-format check
>> >>> -- compare-bugzilla-version-git-branch
>> >>> -- bugzilla-post
>> >>> -- comment-on-issue
>> >>> -- fedora-smoke (mainly don't want warning).
>> >>
>> >>
>> >> +1
>> >>
>> >>> - Parallel
>> >>> -- all devrpm jobs
>> >>> -- 32bit smoke
>> >>> -- freebsd-smoke
>> >>> -- smoke
>> >>> -- strfmt_errors
>> >>> -- python-lint, and shellcheck.
>> >>
>> >>
>> >> I?m sure there must be a reason but would like to know that why do they need to be parallel? Can?t we have them sequentially to have similar benefits of the resource utilisation like above? Or are all these individual jobs are time consuming such that having them sequentially will lead the overall smoke job to consume much longer?
>> >>
>> >>>
>> >>> * Remove Verified flag. No point in one more extra button which users need to click, anyways CentOS regression is considered as 'Verification'.
>> >
>> >
>> > The requirement of verified flag by patch owner for regression to run was added because the number of Jenkins machines we had were few and patches being uploaded were many.
>>
>> However, do we consider that at present time the situation has
>> improved to consider the change Amar asks for?
>>
>> >
>> >>>
>> >>> * In a normal flow, let CentOS regression which is running after 'Verified' vote, be triggered on first 'successful' +1 reviewed vote.
>> >>
>> >>
>> >> As I believe some reviewers/maintainers (including me) would like to see the regression vote to put a +1/+2 in most of the patches until and unless they are straight forward ones. So although with this you?re reducing the burden of one extra click from the patch owner, but on the other way you?re introducing the same burden on the reviewers who would like to check the regression vote. IMHO, I don?t see much benefits in implementing this.
>> >
>> >
>> > Agree with Atin here. Burden should be on machines before people. Reviewers prefer to look at patches that have passed regression.
>> >
>> > In github heketi, we have configured regression to run on all patches that are submitted by heketi developer group. If such configuration is possible in gerrit+Jenkins, we should definitely do it that way.
>> >
>> > For patches that are submitted by someone outside of the developer group, a maintainer should verify that the patch doesn't do anything harmful and mark the regression to run.
>> >
>>
>> Deepshikha, is the above change feasible in the summation of Amar's proposal?
>
> Yes, I'm planning to implement the regression & flag related changes initially if everyone agrees.
>>
>>
>> >>>
>> >>> * For those patches which got pushed to system to just 'validate' behavior, to run sample tests, WIP patches, continue to support 'recheck centos' comment message, so we can run without any vote. Let it not be the norm.
>> >>>
>> >>>
>> >>> With this, I see that we can reduce smoke failures utilize 90% less resources for a patch which would fail smoke anyways. (ie, 95% of the smoke failures would be caught in first 10% of the resource, and time).
>> >>>
>> >>> Also we can reduce number of regression running, as review is mandatory to run regression.
>> >>>
>> >>> These are just suggestions, happy to discuss more on these.
>> _______________________________________________
>> Gluster-infra mailing list
>> Gluster-infra at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-infra
--
sankarshan mukhopadhyay
From atumball at redhat.com Tue Jun 25 05:49:54 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Tue, 25 Jun 2019 11:19:54 +0530
Subject: [Gluster-infra] New workflow proposal for glusterfs repo
In-Reply-To:
References:
Message-ID:
Adding gluster-devel ML.
Only concern to my earlier proposal was not making regression runs wait for
reviews, but to be triggered automatically after successful smoke.
The ask was to put burden on machines than on developers, which I agree to
start with. Lets watch the expenses due to this change for a month once it
gets implemented, and then take stock of the situation. For now, lets
reduce one more extra work for developers, ie, marking Verified flag.
On Tue, Jun 25, 2019 at 11:01 AM Sankarshan Mukhopadhyay <
sankarshan.mukhopadhyay at gmail.com> wrote:
> Amar, can you bring about an agreement/decision on this so that we can
> make progress?
>
>
So, My take is:
Lets make serialized smoke + regression a reality. It may add to overall
time, but if there are failures, this has potential to reduce overall
machine usage... for a successful patch, the extra few minutes at present
doesn't harm as many of our review avg time is around a week.
> On Tue, Jun 25, 2019 at 10:55 AM Deepshikha Khandelwal
> wrote:
> >
> >
> >
> > On Mon, Jun 24, 2019 at 5:30 PM Sankarshan Mukhopadhyay <
> sankarshan.mukhopadhyay at gmail.com> wrote:
> >>
> >> Checking back on this - do we need more voices or, amendments to
> >> Amar's original proposal before we scope the implementation?
> >>
> >> I read Amar's proposal as desiring an outcome where the journey of a
> >> valid/good patch through the test flows is fast and efficient.
>
Absolutely! This is critical for us to be inclusive community.
> >>
> >> On Wed, Jun 12, 2019 at 11:58 PM Raghavendra Talur
> wrote:
> >> >
> >> >
> >> >
> >> > On Wed, Jun 12, 2019, 1:56 PM Atin Mukherjee
> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Wed, 12 Jun 2019 at 18:04, Amar Tumballi Suryanarayan <
> atumball at redhat.com> wrote:
> >> >>>
> >> >>>
> >> >>> Few bullet points:
> >> >>>
> >> >>> * Let smoke job sequentially for below, and if successful, in
> parallel for others.
> >> >>> - Sequential:
> >> >>> -- clang-format check
> >> >>> -- compare-bugzilla-version-git-branch
> >> >>> -- bugzilla-post
> >> >>> -- comment-on-issue
> >> >>> -- fedora-smoke (mainly don't want warning).
> >> >>
> >> >>
> >> >> +1
> >> >>
> >> >>> - Parallel
> >> >>> -- all devrpm jobs
> >> >>> -- 32bit smoke
> >> >>> -- freebsd-smoke
> >> >>> -- smoke
> >> >>> -- strfmt_errors
> >> >>> -- python-lint, and shellcheck.
> >> >>
> >> >>
> >> >> I?m sure there must be a reason but would like to know that why do
> they need to be parallel? Can?t we have them sequentially to have similar
> benefits of the resource utilisation like above? Or are all these
> individual jobs are time consuming such that having them sequentially will
> lead the overall smoke job to consume much longer?
>
Most of these are doing the same thing, make dist, make install, make rpms.
but on different arch and with different flags. To start with, we can do
these also sequentially. That way, infra team needn't worry about some
parallel, some sequential jobs.
> >> >>
> >> >>>
> >> >>> * Remove Verified flag. No point in one more extra button which
> users need to click, anyways CentOS regression is considered as
> 'Verification'.
> >> >
> >> >
> >> > The requirement of verified flag by patch owner for regression to run
> was added because the number of Jenkins machines we had were few and
> patches being uploaded were many.
> >>
> >> However, do we consider that at present time the situation has
> >> improved to consider the change Amar asks for?
> >>
> >> >
> >> >>>
> >> >>> * In a normal flow, let CentOS regression which is running after
> 'Verified' vote, be triggered on first 'successful' +1 reviewed vote.
> >> >>
> >> >>
> >> >> As I believe some reviewers/maintainers (including me) would like to
> see the regression vote to put a +1/+2 in most of the patches until and
> unless they are straight forward ones. So although with this you?re
> reducing the burden of one extra click from the patch owner, but on the
> other way you?re introducing the same burden on the reviewers who would
> like to check the regression vote. IMHO, I don?t see much benefits in
> implementing this.
> >> >
> >> >
> >> > Agree with Atin here. Burden should be on machines before people.
> Reviewers prefer to look at patches that have passed regression.
> >> >
> >> > In github heketi, we have configured regression to run on all patches
> that are submitted by heketi developer group. If such configuration is
> possible in gerrit+Jenkins, we should definitely do it that way.
> >> >
> >> > For patches that are submitted by someone outside of the developer
> group, a maintainer should verify that the patch doesn't do anything
> harmful and mark the regression to run.
> >> >
> >>
> >> Deepshikha, is the above change feasible in the summation of Amar's
> proposal?
> >
> > Yes, I'm planning to implement the regression & flag related changes
> initially if everyone agrees.
> >>
> >>
>
I would say, lets get started on these changes.
Regards,
Amar
> >> >>>
> >> >>> * For those patches which got pushed to system to just 'validate'
> behavior, to run sample tests, WIP patches, continue to support 'recheck
> centos' comment message, so we can run without any vote. Let it not be the
> norm.
> >> >>>
> >> >>>
> >> >>> With this, I see that we can reduce smoke failures utilize 90% less
> resources for a patch which would fail smoke anyways. (ie, 95% of the smoke
> failures would be caught in first 10% of the resource, and time).
> >> >>>
> >> >>> Also we can reduce number of regression running, as review is
> mandatory to run regression.
> >> >>>
> >> >>> These are just suggestions, happy to discuss more on these.
> >> _______________________________________________
> >> Gluster-infra mailing list
> >> Gluster-infra at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-infra
>
>
>
> --
> sankarshan mukhopadhyay
>
> _______________________________________________
> Gluster-infra mailing list
> Gluster-infra at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-infra
--
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From bugzilla at redhat.com Fri Jun 28 07:31:36 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 28 Jun 2019 07:31:36 +0000
Subject: [Gluster-infra] [Bug 1708257] Grant additional maintainers merge
rights on release branches
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1708257
Rinku changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |1724957
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1724957
[Bug 1724957] Grant additional maintainers merge rights on release branches
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Fri Jun 28 07:31:36 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 28 Jun 2019 07:31:36 +0000
Subject: [Gluster-infra] [Bug 1724957] New: Grant additional maintainers
merge rights on release branches
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1724957
Bug ID: 1724957
Summary: Grant additional maintainers merge rights on release
branches
Product: GlusterFS
Version: mainline
Status: NEW
Component: project-infrastructure
Assignee: bugs at gluster.org
Reporter: rkothiya at redhat.com
CC: bugs at gluster.org, dkhandel at redhat.com,
gluster-infra at gluster.org, hgowtham at redhat.com,
rkothiya at redhat.com, srangana at redhat.com,
sunkumar at redhat.com
Depends On: 1708257
Target Milestone: ---
Classification: Community
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1708257
[Bug 1708257] Grant additional maintainers merge rights on release branches
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Fri Jun 28 08:38:32 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 28 Jun 2019 08:38:32 +0000
Subject: [Gluster-infra] [Bug 1724957] Grant additional maintainers merge
rights on release branches
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1724957
--- Comment #1 from Deepshikha khandelwal ---
Can you please check now?
--
You are receiving this mail because:
You are on the CC list for the bug.
From bugzilla at redhat.com Fri Jun 28 10:26:30 2019
From: bugzilla at redhat.com (bugzilla at redhat.com)
Date: Fri, 28 Jun 2019 10:26:30 +0000
Subject: [Gluster-infra] [Bug 1716097] infra: create
suse-packing@lists.nfs-ganesha.org alias
In-Reply-To:
References:
Message-ID:
https://bugzilla.redhat.com/show_bug.cgi?id=1716097
--- Comment #4 from Marc Dequ?nes (Duck) ---
Could someone give some status update please?
--
You are receiving this mail because:
You are on the CC list for the bug.
From ravishankar at redhat.com Fri Jun 14 04:26:01 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Fri, 14 Jun 2019 04:26:01 -0000
Subject: [Gluster-infra] review.gluster.org is not accessible.
In-Reply-To:
References:
Message-ID: <6f75e742-846a-9db6-d69f-efd04e165f0c@redhat.com>
On 14/06/19 9:52 AM, Ravishankar N wrote:
> Hi,
>
> I have raised https://bugzilla.redhat.com/show_bug.cgi?id=1720453. The
> issue seems to be intermittent. I was not able to access it on both
> firefox and chrome. Now chrome works but not firefox.
Okay now chrome also gives me the error...
>
> Regards,
> Ravi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: efjihjmfmnckfhdl.png
Type: image/png
Size: 144101 bytes
Desc: not available
URL: