[Gluster-infra] [Gluster-devel] Reboot for Meltdown and stuff

Sat Jan 6 19:48:43 UTC 2018

Le samedi 06 janvier 2018 à 11:44 +0100, Michael Scherer a écrit :
> Le vendredi 05 janvier 2018 à 14:24 +0100, Michael Scherer a écrit :
> > Hi,
> > 
> > unless you are living in a place without any internet (a igloo in
> > Antartica, the middle of the Gobi desert, a bunker in Switzerland
> > or
> > simply the Paris underground train), you may have seen the news
> > that
> > this week is again a security nightmare (also called "just a normal
> > Wednesday" among practitioners ), and that we have important kernel
> > patch to push, that do requiers a reboot. 
> > 
> > See https://spectreattack.com/ 
> > 
> > While I suspect our infra will not be targeted and there is more
> > venue
> > to attack on local computers and browsers who are the one running
> > proprietary random code in form of JS on a regular basis, we still
> > have
> > to upgrade everything to be sure.
> > 
> > Therefor, I am gonna have to reboot all the infra (yes, the 83
> > servers), minus the few servers I already did reboot (because in
> > HA,
> > or
> > not customer facing) tomorrow.
> > 
> > I will block jenkins, and wait for the jobs to be finished before
> > rebooting the various servers. I will send a email tomorrow once
> > the
> > reboot start (e.g., when/if I wake up), and another one things are
> > good
> > (or if stuff broke in a horrible fashion too, as it happened
> > today).
> > 
> > If there is some precaution or anything to take, people have around
> > 24h
> > to voice their concerns. 
> 
> Reboot is starting. I already did various backend servers, the
> document
> I used for tracking the work is on 
> https://bimestriel.framapad.org/p/gluster_infra_reboot

So almost all Linux servers got rebooted, most without issues, but
during the day, I started to have the first symptom of a cold
(headaches, shivering, etc), so I had to ping Nigel to finish the last
server (who wasn't without issue)

For people who do not want gruesome details on the reboots, you can
stop here.

We did got some trouble with:

- a few servers on Rackspace (mostly infra) with cloud-init reseting
the configuration to dhcp, and the dhcp not working. I am finally
changing that and was in the course of fixing it for good before going
back to bed.

- gerrit didn't start automatically at boot. I know we had a fix for
that, but not sure on why it didn't work, or if we didn't deployed yet.

- supercolony seems to be unable to boot the latest kernel. It went so
bad that the emergency console wasn't working. A erroneous message said
"disabled for your account", so I did open a rackspace ticket and
waited. This occurred as I started to not feel well, so I didn't really
searched more, or I would have:
   - seen that the console was working for others servers (thus
erroneous messages)
   - would have tried harder to boot another kernel 
   - search a bit more on internal list that said "there is some issue
somewhere around RHEL 6". Didn't investigate more, but that's also what
happened.

In the end, Nigel took over the problem solving and pinged harder
Rackspace, whose support suggested to boot another kernel, which he did
(but better than I did). 

And thus supercolony is back, but not upgraded.

The last one still puzzle me, because the current configuration is:
"default=2", so that should start the 3rd kernel in the list.

Grub doc say "The first entry (here, counting starts with number zero,
not one!) will be the default choice", it was "0" when i first tried to
boot another kernel (switched to 1).

So since we have:

[root at supercolony ~]# grep title /boot/grub/menu.lst 
title Red Hat Enterprise Linux Server (2.6.32-696.18.7.el6.x86_64)
title Red Hat Enterprise Linux Server (2.6.32-696.16.1.el6.x86_64)
title Red Hat Enterprise Linux Server (2.6.32-642.15.1.el6.x86_64)

default=1 should have used 2.6.32-696.16.1, but it didn't boot.

Nigel changed it for "default=2", so that should have used 2.6.32-
642.15.1, but plot twist...

# uname -a
Linux supercolony.gluster.org 2.6.32-696.16.1.el6.x86_64 #1 SMP Sun Oct
8 09:45:56 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

So there is something fishy for grub, but as I redact that from my bed,
maybe the problem is on my side. I am sure it will be clearer once I
hit "send".

So, to recap, we have one or two servers to upgrade (cf the pad), the
*bsd are not patched yet (I quickly checked their lists, but I do not
expect it soon), but since the more urgent issues were on the
hypervisor side, we are ok for that.

The grub on supercolony need to be investigated, and supercolony should
be upgraded as well. 

I also need to take some rest.

Many thanks for Nigel for taking over when my body failed me.

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20180106/7db4b7d6/attachment-0001.sig>