[Gluster-infra] Downtime for Jenkins
vbellur at redhat.com
Sun May 17 12:36:19 UTC 2015
On 05/17/2015 02:32 PM, Vijay Bellur wrote:
> [Adding gluster-devel]
> On 05/16/2015 11:31 PM, Niels de Vos wrote:
>> On Sat, May 16, 2015 at 06:32:00PM +0200, Niels de Vos wrote:
>>> It seems that many failures of the regression tests (at least for
>>> NetBSD) are caused by failing to reconnect to the slave. Jenkins tries
>>> to keep a control connection open to the slaves, and reconnects when the
>>> connection terminates.
>>> I do not know why the connection is disrupted, but I can see that
>>> Jenkins is not able to resolve the hostname of the slave. For example,
>>> from (well, you have to find the older logs, Jenkins seems to have
>>> automatically reconnected)
>>> http://build.gluster.org/computer/nbslave72.cloud.gluster.org-v2/log :
>>> java.io.IOException: There was a problem while connecting to
>>> Caused by: java.net.UnknownHostException:
>>> nbslave71.cloud.gluster.org: Name or service not known
>>> The error in the console log of the regression test is less helpful, it
>>> only states the disconnection failure:
>> In fact, this looks very much related to these reports:
>> - https://issues.jenkins-ci.org/browse/JENKINS-19619 duplicate of 18879
>> - https://issues.jenkins-ci.org/browse/JENKINS-18879
>> This problem should be fixed in Jenkins 1.524 and newer. Time to upgrade
>> Jenkins too?
> Yes, I have started an upgrade. Please expect a downtime for Jenkins
> during the upgrade.
> I will update once the activity is complete.
Upgrade to Jenkins v1.613 is now complete and Jenkins seems to be
largely doing fine. Several plugins of Jenkins have also been updated to
their latest versions. During the course of the upgrade, I noticed that
we were using the deprecated 'gerrit approve' interface to intimate
status of a smoke run. Have changed that to use 'gerrit review' and this
seems to have addressed the problem of smoke tests not reporting status
back to gerrit.
There were a few instances of Jenkins not being able to launch slaves
through ssh but was later successful upon automatic retries. We will
need to watch this behavior to see if this problem persists and comes in
the way of normal functioning.
Manu - can you please verify and report back if the NetBSD slaves work
better with the upgraded Jenkins master?
All - please drop a note on gluster-infra if you happen to notice
problems with Jenkins.
More information about the Gluster-infra