[Gluster-infra] [Gluster-devel] 8/10 AWS jenkins builders disconnected

Michael Scherer mscherer at redhat.com
Wed Mar 6 16:52:33 UTC 2019


Le mercredi 06 mars 2019 à 21:31 +0530, Sankarshan Mukhopadhyay a
écrit :
> On Wed, Mar 6, 2019 at 8:47 PM Michael Scherer <mscherer at redhat.com>
> wrote:
> > 
> > Le mercredi 06 mars 2019 à 17:53 +0530, Sankarshan Mukhopadhyay a
> > écrit :
> > > On Wed, Mar 6, 2019 at 5:38 PM Deepshikha Khandelwal
> > > <dkhandel at redhat.com> wrote:
> > > > 
> > > > Hello,
> > > > 
> > > > Today while debugging the centos7-regression failed builds I
> > > > saw
> > > > most of the builders did not pass the instance status check on
> > > > AWS
> > > > and were unreachable.
> > > > 
> > > > Misc investigated this and came to know about the patch[1]
> > > > which
> > > > seems to break the builder one after the other. They all ran
> > > > the
> > > > regression test for this specific change before going offline.
> > > > We suspect that this change do result in infinite loop of
> > > > processes
> > > > as we did not see any trace of error in the system logs.
> > > > 
> > > > We did reboot all those builders and they all seem to be
> > > > running
> > > > fine now.
> > > > 
> > > 
> > > The question though is - what to do about the patch, if the patch
> > > itself is the root cause? Is this assigned to anyone to look
> > > into?
> > 
> > We also pondered on wether we should protect the builder from that
> > kind
> > of issue. But since:
> > - we are not sure that the hypothesis is right
> > - any protection based on "limit the number of process" would
> > surely
> > sooner or later block legitimate tests, and requires adjustement
> > (and
> > likely investigation)
> > 
> > we didn't choose to follow that road for now.
> > 
> 
> This is a good topic though. Is there any logical way to fence off
> the
> builders from noisy neighbors?

I am not sure to follow the question, what I had in mind was more to
just regular ulimit to avoid the equivalent of a fork bomb (again, if
the hypothesis is the right one).

Since our builders are running 1 job at a time, there is no noisy
neighbor issues, or rather, since that's AWS, we can't control anything
regarding contention of shared ressources anyway
.

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20190306/d41a432a/attachment.sig>


More information about the Gluster-infra mailing list