Enable Prometheus metrics
What are we going to do?
Enable Prometheus metrics (admin settings)
Why are we doing it?
To re-enable unicorn monitoring
When are we going to do it?
- Start time: ___
- Duration: ___
- Estimated end time: ___
How are we going to do it?
- Enable Prometheus metrics in admin settings.
- Restart all unicorn services.
NOTE: HUP is not sufficient for safety reasons.
How are we preparing for it?
What can we check before starting?
What can we check afterwards to ensure that it's working?
Impact
- Type of impact: <internal|client facing|no impact>
- What will happen: ___
- Do we expect downtime? (set the override in pagerduty): ___
How are we communicating this to our customers?
- Announce the deployment well in advance: ___
- Tweet after the change.
What is the rollback plan?
Monitoring
- Graphs to check for failures:
-
- Graphs to check for improvements:
-
- Alerts that may trigger:
-
[IF NEEDED]
Google Doc to follow during the change (remember to link in the on-call log)
Scheduling
Schedule a downtime in the production calendar twice as long as your worst duration estimate, be pessimistic (better safe than sorry)
When things go wrong (downtime or service degradation)
- Label the change issue as outage
- Perform a blameless post mortem
References
Edited by Ben Kochie