[Gluster-devel] [Gluster-infra] Freebsd builder upgrade to 10.4, maybe 11
Michael Scherer
misc at redhat.com
Wed Sep 12 10:35:29 UTC 2018
Le mardi 11 septembre 2018 à 19:54 +0530, Nigel Babu a écrit :
> On Tue, Sep 11, 2018 at 7:06 PM Michael Scherer <misc at redhat.com>
> wrote:
>
> > And... rescue mode is not working. So the server is down until
> > Rackspace fix it.
> >
> > Can someone disable the freebsd smoke test, as I think our 2nd
> > builder
> > is not yet building fine ?
> >
>
>
> Disabled. Please do not merge any JJB review requests until this is
> fixed.
So, just to keep people updated on this adventure:
- after 3 or 4h, the rescue mode managed to appear. Turn out that it
take time to copy the rescue image to the cloud, and I guess no one
recently did that for freebsd. Rackspace say "40 minutes", which is
still a lot.
This morning, I was greeted by a welcoming prompt saying "zfs, can't
mount the root" or something, because the rescue mode of rackspace was
a bit broken. Trying to fix it, I did reboot out of rescue mode.
So I went back to my initial plan: "boot in rescue mode".
Thanks to surhuman reflexes I acquired dodging nerfs guns in the
office, I manage to hit the "s" key at the right time. While it seems
easy, the remote console of rackspace is a bit slow to show the boot
loader, the latency from France over the atlantic is a noticable, and
the interface disconnect itself every minutes of idling, and every time
the video mode is changed (like when it go from POST to bootloader
menu). So that was a race between the machine and me, and I won it,
with a "#_ " prompt waiting for me.
Of course, things would be too simple if that was just that, so / was
readonly. And that's freebsd, so "mount -o rw,remount /" didn't work.
It didn't work for 2 reasons:
- zfs do not work like this ("zfs set readonly=off zroot", for people
asking how to do later)
- the keyboard was kinda broken. Not broke like "that's a us layout and
misc is using a french layout" broken, cause that, I know how to deal
with it. More broken with "only a-z keys are working, and the rest are
randomly placed somewhere else". It turn out that without '.' and '/',
thing can be complicated in the shell. But I am full of ressources, and
judicious use of <tab> + <backspace> did let me get what I needed.
So I did manage to change the root password, reboot again.
Password didn't work the first time (did I mention latency), I retry,
it worked.
And so, network is fine. But sshd didn't start. Why ? because it check
the config, and the config did show a warning on "duplicate line".
The upgrade did change sshd config file, adding a ton of comment, and
at the end, a duplicated line:
Subsystem sftp /usr/libexec/sftp-server
And upon removal, things were working.
However, I did reboot to test and ... still broken. So back to me
racing against the bootloader I guess. (cause the temp password I set
is not working again, wonder if there was some fallback due to zfs or
something).
So while that's ultimately my fault for not reading the 150 lines diff
presented by the upgrade prompt, maybe they should have not blocked on
a warning, and/or verified before asking me to reboot.
I guess the lesson is that I kinda need to write a playbook for that,
because 11.0 is around the corner.
On a related note, we will be putting the 2nd builder (the one in our
DC) as a non voting job, so we can see if this builder work.
The main issue is that the builder in the cloud (freebsd0) is a custom
manually installed one, and so there is some modification that were not
recorded (in fact, all of them). So build work there, but not on a
freshly installed ones. Niels have been working on fixing that, but
last time I did check (1 month ago), we still had issues that were
found after I enabled the builder and this did broke smoke tests
(something around installation). I wanted to add a bug for that, but
didn't had time yet.
So a non voting would be perfect for that.
--
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS
More information about the Gluster-devel
mailing list