Do not update/delete: Banner broadcast message test data

Do not update/delete: Notification broadcast message test data

Enable Prometheus metrics

What are we going to do?

Enable Prometheus metrics (admin settings)

Why are we doing it?

To re-enable unicorn monitoring

When are we going to do it?

Start time: ___
Duration: ___
Estimated end time: ___

How are we going to do it?

Enable Prometheus metrics in admin settings.
Restart all unicorn services.

NOTE: HUP is not sufficient for safety reasons.

How are we preparing for it?

What can we check before starting?

What can we check afterwards to ensure that it's working?

Prometheus status

Impact

Type of impact: <internal|client facing|no impact>
What will happen: ___
Do we expect downtime? (set the override in pagerduty): ___

How are we communicating this to our customers?

Announce the deployment well in advance: ___
Tweet after the change.

What is the rollback plan?

Monitoring

Graphs to check for failures:
Graphs to check for improvements:
Alerts that may trigger:

[IF NEEDED]

Google Doc to follow during the change (remember to link in the on-call log)

Scheduling

Schedule a downtime in the production calendar twice as long as your worst duration estimate, be pessimistic (better safe than sorry)

When things go wrong (downtime or service degradation)

Label the change issue as outage
Perform a blameless post mortem

References

Edited Sep 05, 2017 by Ben Kochie