[Meta] Application Performance Monitoring
Overview
Performance monitoring is a core capability needed to deliver and maintain high quality software and experiences for your users. At GitLab, we have an opportunity to help our customers more easily and effectively track, manage, and optimize their software and hardware performance.
There are a few core needs that most organizations have:
- Detecting and preventing changes that would negatively impact performance and user experience
- Tracking long term performance trends, and progress towards goals
- Detection of issues in production, and potentially halting a deploy or rolling back
- Resources and insight to aid in debugging
- Visibility into infrastructure usage, efficiency, and costs
GitLab, with itsintegrated solution, is well positioned to help customers address these needs quickly and easily.
Detecting and preventing code performance degradations
Even for organizations with mature CI/CD capabilities in place, it is important to try to detect and prevent performance regressions from making it into production. With Continuous or Constant Deployment the time to recover and impact may not be high, but there is a cost to both users and the organization when a deploy fails or rollbacks are required.
To help customers avoid these situations, we can add enhance GitLab with a few new features:
- Track and compare Canary vs Stable (https://gitlab.com/gitlab-org/gitlab-ee/issues/2594)
- Determining performance impact of a MR (https://gitlab.com/gitlab-org/gitlab-ee/issues/3173): Automatically calculate the performance impact of a merge, without deploying to production first
Track long term performance gains/losses, progress towards goals
Building a fast and responsive application is not a one time effort, it is an on-going journey. Over time, new features are added, bugs are fixed, and typical user behavior may change. In short, it's a moving target and requires diligence and a constant eye.
Often, a company may not be happy with the performance of their application. Achieving their responsiveness goals will not occur in a single sprint. By providing a window into long term progress, developers as well as their managers can track continued progress towards objectives.
There are a few features which fall into this category:
- Display the long term performance trends of a single branch (https://gitlab.com/gitlab-org/gitlab-ee/issues/3542)
- Internal Ops Dashboard: (https://gitlab.com/gitlab-org/gitlab-ee/issues/3541)
- Improve Environment Page to show comprehensive status
Detection of issues in Production
A critical and foundational capability of a monitoring system is to be able to detect issues in production. We have a base set of features for this today, with the capability to monitor system and response metrics with Prometheus. We have the opportunity to extend this functionality and take it further, though.
- Track production page performance (https://gitlab.com/gitlab-org/gitlab-ee/issues/3046)
- Production health checks (https://gitlab.com/gitlab-org/gitlab-ee/issues/3554)
- Service mesh (https://gitlab.com/gitlab-org/gitlab-ee/issues/3633)
- Alerting (https://gitlab.com/gitlab-org/gitlab-ee/issues/3610, https://gitlab.com/gitlab-org/gitlab-ee/issues/3555)
Operational insights and troubleshooting
- Internal Ops Dashboard: (https://gitlab.com/gitlab-org/gitlab-ee/issues/3541)
- System Status Dashboard (External): (https://gitlab.com/gitlab-org/gitlab-ee/issues/3557)
- Kubernetes cluster monitoring (https://gitlab.com/gitlab-org/gitlab-ce/issues/27890)
- Service mesh monitoring / microservice dashboard (https://gitlab.com/gitlab-org/gitlab-ee/issues/3633)
- Database query performance (https://gitlab.com/gitlab-org/gitlab-ee/issues/3307)
- Log integration/display, collection, analysis
- Tracing