[Gluster-infra] [IMPORTANT!] Entire test cluster is down

Jeff Darcy jdarcy at redhat.com
Fri Jan 27 00:30:59 UTC 2017

I saw a bunch of jobs get aborted about half an hour ago, due to the nodes they were on going offline.  I figured it was a power hit or something similar and things would come back by themselves, so I went off to dinner.  Checking now, they're still offline and seeming inclined to remain so.  I can ping, ssh port is open, but when I try to launch the slave agent Jenkins's ssh connection fails.

[01/26/17 16:14:18] [SSH] Opening SSH connection to slave0.cloud.gluster.org:22.
Connection timed out (Connection timed out)
ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins.
java.lang.IllegalStateException: Connection is not established!
	at com.trilead.ssh2.Connection.getRemainingAuthMethods(Connection.java:1030)
	at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPasswordAuthenticator.canAuthenticate(TrileadSSHPasswordAuthenticator.java:82)
	at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:207)
	at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:169)
	at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1212)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
[01/26/17 16:15:21] Launch failed - cleaning up connection
[01/26/17 16:15:21] [SSH] Connection closed.

So, basically, it looks like no tests are going to happen until some manual intervention (beyond my own ability) occurs.

More information about the Gluster-infra mailing list