[Gluster-infra] [Gluster-devel] Reboot for Meltdown and stuff

Sat Jan 6 22:50:13 UTC 2018

Thanks for your quick work on this, get some rest!
We can look at supercolony next week when you're back in action.
- amye

On Sat, Jan 6, 2018 at 11:48 AM, Michael Scherer <mscherer at redhat.com>
wrote:

> Le samedi 06 janvier 2018 à 11:44 +0100, Michael Scherer a écrit :
> > Le vendredi 05 janvier 2018 à 14:24 +0100, Michael Scherer a écrit :
> > > Hi,
> > >
> > > unless you are living in a place without any internet (a igloo in
> > > Antartica, the middle of the Gobi desert, a bunker in Switzerland
> > > or
> > > simply the Paris underground train), you may have seen the news
> > > that
> > > this week is again a security nightmare (also called "just a normal
> > > Wednesday" among practitioners ), and that we have important kernel
> > > patch to push, that do requiers a reboot.
> > >
> > > See https://spectreattack.com/
> > >
> > > While I suspect our infra will not be targeted and there is more
> > > venue
> > > to attack on local computers and browsers who are the one running
> > > proprietary random code in form of JS on a regular basis, we still
> > > have
> > > to upgrade everything to be sure.
> > >
> > > Therefor, I am gonna have to reboot all the infra (yes, the 83
> > > servers), minus the few servers I already did reboot (because in
> > > HA,
> > > or
> > > not customer facing) tomorrow.
> > >
> > > I will block jenkins, and wait for the jobs to be finished before
> > > rebooting the various servers. I will send a email tomorrow once
> > > the
> > > reboot start (e.g., when/if I wake up), and another one things are
> > > good
> > > (or if stuff broke in a horrible fashion too, as it happened
> > > today).
> > >
> > > If there is some precaution or anything to take, people have around
> > > 24h
> > > to voice their concerns.
> >
> > Reboot is starting. I already did various backend servers, the
> > document
> > I used for tracking the work is on
> > https://bimestriel.framapad.org/p/gluster_infra_reboot
>
> So almost all Linux servers got rebooted, most without issues, but
> during the day, I started to have the first symptom of a cold
> (headaches, shivering, etc), so I had to ping Nigel to finish the last
> server (who wasn't without issue)
>
>
> For people who do not want gruesome details on the reboots, you can
> stop here.
>
>
> We did got some trouble with:
>
> - a few servers on Rackspace (mostly infra) with cloud-init reseting
> the configuration to dhcp, and the dhcp not working. I am finally
> changing that and was in the course of fixing it for good before going
> back to bed.
>
> - gerrit didn't start automatically at boot. I know we had a fix for
> that, but not sure on why it didn't work, or if we didn't deployed yet.
>
> - supercolony seems to be unable to boot the latest kernel. It went so
> bad that the emergency console wasn't working. A erroneous message said
> "disabled for your account", so I did open a rackspace ticket and
> waited. This occurred as I started to not feel well, so I didn't really
> searched more, or I would have:
>    - seen that the console was working for others servers (thus
> erroneous messages)
>    - would have tried harder to boot another kernel
>    - search a bit more on internal list that said "there is some issue
> somewhere around RHEL 6". Didn't investigate more, but that's also what
> happened.
>
> In the end, Nigel took over the problem solving and pinged harder
> Rackspace, whose support suggested to boot another kernel, which he did
> (but better than I did).
>
> And thus supercolony is back, but not upgraded.
>
> The last one still puzzle me, because the current configuration is:
> "default=2", so that should start the 3rd kernel in the list.
>
> Grub doc say "The first entry (here, counting starts with number zero,
> not one!) will be the default choice", it was "0" when i first tried to
> boot another kernel (switched to 1).
>
> So since we have:
>
> [root at supercolony ~]# grep title /boot/grub/menu.lst
> title Red Hat Enterprise Linux Server (2.6.32-696.18.7.el6.x86_64)
> title Red Hat Enterprise Linux Server (2.6.32-696.16.1.el6.x86_64)
> title Red Hat Enterprise Linux Server (2.6.32-642.15.1.el6.x86_64)
>
> default=1 should have used 2.6.32-696.16.1, but it didn't boot.
>
> Nigel changed it for "default=2", so that should have used 2.6.32-
> 642.15.1, but plot twist...
>
> # uname -a
> Linux supercolony.gluster.org 2.6.32-696.16.1.el6.x86_64 #1 SMP Sun Oct
> 8 09:45:56 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> So there is something fishy for grub, but as I redact that from my bed,
> maybe the problem is on my side. I am sure it will be clearer once I
> hit "send".
>
> So, to recap, we have one or two servers to upgrade (cf the pad), the
> *bsd are not patched yet (I quickly checked their lists, but I do not
> expect it soon), but since the more urgent issues were on the
> hypervisor side, we are ok for that.
>
> The grub on supercolony need to be investigated, and supercolony should
> be upgraded as well.
>
> I also need to take some rest.
>
> Many thanks for Nigel for taking over when my body failed me.
>
>
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
> _______________________________________________
> Gluster-infra mailing list
> Gluster-infra at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-infra
>

-- 
Amye Scavarda | amye at redhat.com | Gluster Community Lead
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20180106/c73c8396/attachment.html>