Setting up dashboards and alerts to enable top-to-bottom product ownership
From https://gitlab.com/gitlab-com/organization/issues/62#note_25914642
Concept
Any feature that is developed in GitLab needs to run as a robust feature on GitLab.com (and large customer instances), meaning that
- product engineering areas (
Platform
CI
Discussion
Edge
) should have top-to-bottom ownership of their output - each team should have a dashboard of their metrics (powered by Prometheus, developed by the team themselves, guidance as needed from Production Engineering)
- each product area should develop measurable objectives and alert levels on feature availability - early detection of problems included.
- the definition of done for features should include a stated objective on usage / uptime / availability with respective alerts set up in the runbooks.
Implementation
Starting with one or two teams representing a feature, or product area:
-
Product managers + technical team leads to propose what kind of data they would need / want to see in a dashboard to see if their area of the app is working as intended. -
Build dashboards -
Define alert levels, including what action should be taken when the alert fires -
Set alerts -
Spawn new issue to repeat this for next group of teams / product areas.