[Gluster-infra] Reboot for dirtycow, and the story of my unwarranted optimism
mscherer at redhat.com
Sat Nov 5 22:32:49 UTC 2016
so people might have seen that last week, a rather severe vuln was
I was at Openstack summit when it was found, and the updated kernel
package wasn't on the CDN until I was out for holiday. The main
reason is that RH test kernel patchs a bit more than others, especially
for something as critical. And Centos wait on RH to push update
So while this was not uber urgent as shellshock or heartbleed, it was
still rather critical to fix as I have a rather minimal trust in Jenkins
and Gerrit to be secure.
So once I was back on friday, and after dealing with others fires and
infra, I did reboot stuff that wouldn't impact too much production (like
rsyslog, freeipa servers, the salt server, the virt hosts with builders)
and decided to push for a reboot of jenkins and gerrit for the weekend.
In retrospect, I tought I did discuss on irc, but I forgot, sorry about
Of course, because I like to live dangerously, I did that in the
saturday morning, on a travel day. It should have been fast.
However, things never go as expected and we did face a few issues:
- myrmicinae.rht.gluster.org, the host running our VM decided to take 1h
to boot. At the firmware/BIOS level. That's slightly inacceptable, but I
have also a limited capacity to fix, since this would requires 1) to
test reboot (so lose 1h) 2) to fiddle in the Bios (and so reboot).
So that's why jenkins/gerrit were down around 10h CET until 11h.
- jenkins didn't (as usual) restart. I found the root cause, this was
due to NetworkManager and network init script kinda doing the same
stuff, but in different way. This is now fixed, and jenkins VM should
reboot without a human to fix stuff around.
- gerrit for some reason do not start at boot. I am not sure what was
the way it was done before, but I suspect something related
to /etc/init.d that got wiped after a upgrade or something, because
gerrit initscript is not a real initscript. So I did some hack
in /etc/rc.local, since the upgrade to EL7 is around the corner, and I
had better things to do in the weekend that fixing some bash stuff (like
fixing python stuff).
- gerrit VM DNS was incorrect, and no one told me until 6h after the
reboot (why no one told on irc and or on the list and or bugzilla is a
issue that I will surely have to investigate). Why did the DNS got
changed (or if it didn't changed, how did it worked before ?) is the
part that I still cannot explain. But it got for some reason reverted to
the old setting, using the libvirt gateway as dns, which wasn't working
with the current setup. So this was fixed after Nigel pinged me on my
phone, and I managed to connect from the train to fix it.
So I suspect that's all for today, I will try to schedule my next
vacation outside of the unexpected release of a critical kernel patch.
 yes, it was nice, thanks for asking.
 famous last word
Sysadmin, Community Infrastructure and Platform, OSAS
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: This is a digitally signed message part
More information about the Gluster-infra