Rollout Prometheus v2 compression.
What are we going to do?
Rollout Prometheus v2 compression.
https://gitlab.com/gitlab-cookbooks/gitlab-prometheus/merge_requests/247
-
prometheus -
prometheus-2 -
prometheus-3
Why are we doing it?
Reduce storage/memory requirement by 60-70%. The default compression requires about 500G per 3 months.
When are we going to do it?
- Start time: ___
- Duration: ___
- Estimated end time: ___
How are we going to do it?
How are we preparing for it?
What can we check before starting?
What can we check afterwards to ensure that it's working?
Impact
- Type of impact: <internal|client facing|no impact>
- What will happen: ___
- Do we expect downtime? (set the override in pagerduty): ___
How are we communicating this to our customers?
- Tweet before and after the change.
- Do we need to set a broadcast banner?: ___
What is the rollback plan?
Monitoring
- Graphs to check for failures:
-
- Graphs to check for improvements:
-
- Alerts that may trigger:
-
[IF NEEDED]
Google Doc to follow during the change (remember to link in the on-call log)
Scheduling
Schedule a downtime in the production calendar twice as long as your worst duration estimate, be pessimistic (better safe than sorry)
When things go wrong (downtime or service degradation)
- Label the change issue as outage
- Perform a blameless post mortem