[Gluster-infra] [shadow-it] Portmortem for gluster jenkins disk full outage on the 15th of August

Wed Aug 15 09:32:09 UTC 2018

Le mercredi 15 août 2018 à 11:10 +0200, Michael Scherer a écrit :
> Hi folks,
> 
> So Gluster jenkins disk was full today (cause outages do not respect
> public holiday in India (Independance day) and France(Assumption)),
> here is the post mortem for your reading pleasure
> 
> Date: 15/08/2018
> 
> Service affected:
>   Jenkins for Gluster (jenkins-el7.rht.gluster.org)
> 
> Impact:
> 
>   No jenkins job could be triggered.
> 
> Root cause:
> 
>   A disk full mainly because we got new jobs and more patches, so
> regular growth.
> 
> Resolution:
> 
>   Increased the disk by 30G, and investigating if cleanup could be  
>   improved. This did require a reboot.
>
> [....]
> 
> Action items:
> - (misc) see what can be done for myrmicinae (the hypervisor where
> jenkins is running) since there is no more space.

So I looked at myrmicinae, and:
- we have only 23G free for VMs

- there is a 300G partition for the old VM of jenkins/gerrit that we
migrated last november. I kept it to be able to recover if needed, but
I guess that's no longer needed. 

I will sync with Nigel to make extra sure that we can remove this
partition.

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20180815/9f51c8ea/attachment.sig>