Ops view of monitoring and deploy board
Description
We are working on awesome views for the developer to watch their deployments and monitor the state of their apps. But these views are developer-centric. If you are responsible for the overall health of your company's operations, you need a more holistic view.
Proposal
- Group-level overview of all production apps
- Group apps by nested group if applicable
- Show high-level status of app - e.g. green/yellow/red
- Drill down into individual app to see:
- service health (SLOs such as response time and error rate)
- pod health (including system metrics like memory, CPU, IO)
Group-level overview, with sub-groups kind of like:
After clicking on the environment from the ops view, the user would see the service health graphs:
At the top of the page there is a toggle to view the pod health. You could hover over a pod to see the corresponding line on the graphs, or vice versa. I also imagine you could click on a pod in order to keep the active state - this way it would work for mobile as well.