[Gluster-infra] Suspected networking issues on build.gluster.org causing Jenkins failures

Niels de Vos ndevos at redhat.com
Sat May 16 16:32:00 UTC 2015


It seems that many failures of the regression tests (at least for
NetBSD) are caused by failing to reconnect to the slave. Jenkins tries
to keep a control connection open to the slaves, and reconnects when the
connection terminates.

I do not know why the connection is disrupted, but I can see that
Jenkins is not able to resolve the hostname of the slave. For example,
from (well, you have to find the older logs, Jenkins seems to have
automatically reconnected)
http://build.gluster.org/computer/nbslave72.cloud.gluster.org-v2/log :

    java.io.IOException: There was a problem while connecting to nbslave71.cloud.gluster.org:22
    ...
    Caused by: java.net.UnknownHostException: nbslave71.cloud.gluster.org: Name or service not known


The error in the console log of the regression test is less helpful, it
only states the disconnection failure:

    http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/5408/console

Because on build.gluster.org the reboot-vm Jenkins job often fails to
respolve the (correct) ip-addresses of the Rackspace API endpoints, I
suspect that both problems are related to the networking infrastructure.
Could someone look into this issue? It is quite a blocker for resolving
'spurious' test failures, because it often prevents the tests from
getting run at all.

Thanks,
Niels


More information about the Gluster-infra mailing list