[Gluster-infra] Outage Announcement: Mar 14 2017

Wed Mar 8 10:02:18 UTC 2017

Hello,

A reminder that this outage is next Tuesday, i.e. 14th March. Please make sure
anything urgent is done before then.

On Sun, Feb 19, 2017 at 09:10:42PM +0530, Nigel Babu wrote:
> Hello folks,
>
> Michael and I have brought up the community cage outage at the last meeting.
> The details are now confirmed. There will be a major outage window for the
> community cage on March 14-16 2017. The migration will begin around 1000 EST /
> 1600 CET / 2030 IST. During this time, servers will move from one DC (RAL2) to
> a newer space in another building (RAL3). This is to accommodate our growth
> (notably power and space). This is an unavoidable downtime as the networking
> gear is also moving. The 2-days delay concerns the biggest tenant, Ceph, who
> have a bigger footprint in the cage.
>
> The Gluster servers have high priority as our setup is fairly easy and affects
> the team. The cage team is confident that they can get us back online in US
> East Coast working hours. We expect services to be ready at the end of the
> first day and be back up and running on 15th morning in India.
>
> We have people who can assist in case of problems with the move. Both Michael
> and I will be ready to restart the system as soon as possible. As there will be
> no hardware change but just a physical move, we are confident that things will
> restart fast.
>
> ## Impact in Europe and Asia
> Towards the second half of your working day, Gerrit and Jenkins will be down.
> We will bring down Jenkins at 1300 UTC / 1400 CET / 1830 IST. We will abort any
> outstanding job at this point. Other services will follow shortly.
>
> ## Impact in Americas
> Jenkins and Gerrit will be down throughout your working day. Before the
> migration starts, we'll power down servers to prevent data corruption.
>
> ## Services Affected
> * Jenkins
> * Gerrit (stage and prod)
> * Fstat
> * PostgreSQL
> * CentOS CI (Not owned by us but also part of the move)
>
> ## FAQ
> Q: Have you thought of moving things to Rackspace to avoid an outage ?
> A: Yes, we did. The deployment of Jenkins/Gluster is not automated yet. This
>    would have required a rather long downtime to move to Rackspace, as we saw
>    when we moved from iWeb to the cage. This would have required a second
>    outage to move back, making it impractical and more risky.
>
> Q: What is going to happen during the outage window?
> A: The exact schedule is still being worked on. The rough outline is to have 4h
>    to stop network gears, move it, plug it back and configure it. Then, move
>    Gluster rack and work on making it come back, which shouldn't take more than
>    a few hours.
>
> Q: What should we do without Gerrit and Jenkins?
> A: Excellent question, here is a list of tasks that do not requires them
>    * Bug triage, there is always bugs to reproduce.
>    * Documentation would benefit from a cleanup and some restructuring.
>    * Coverity scan reports and other static analysis we run is also in dire
>    * need of a set of eyes.
>    * User support - Stackoverflow users and gluster-users posters would love to
>      get help
>    * We're sure that our release managers would appreciate a through round of
>      testing for 3.10.
>    * Sysadmins love chocolate cake, so baking a cake for them would make them
>      happy. Congratulations, you've actually read this email!
>
> Q: Will the website, mailman, or downloads.gluster.org be affected?
> A: No. They're currently hosted on Rackspace. We have no plan to move them in
>    the next year (see Infrastructure Plans for 2017 on gluster-infra).

--
nigelb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20170308/737b9a2b/attachment.sig>