Deploy Prometheus to monitor customer apps on Kubernetes
Description
With our support for Prometheus continuing to grow, we should offer the ability to automatically deploy a Prometheus server and configure it to monitor the various environments a project is running in.
Right now, we either ask the customer to bring their own Prometheus server or enable Kubernetes (k8s) monitoring on the bundled Prometheus server. These are not good long term options, however:
- Asking a customer to set up and configure Prometheus is a step that we should not require customers to take. With Kubernetes, spinning up a lightweight app like Prometheus is easy, and we should just take care of it.
- The bundled Prometheus instance should be used to primarily monitor the GitLab service itself. It may not have network reachability to all environments of a project, and best practice for Prometheus is to use multiple servers for different monitoring tasks.
- Dynamic environments like Review apps, in particular, pose a challenge. In these cases, environments will be starting and stopping frequently, and therefore scrape targets will also be changing frequently. There is no way for a customer admin to know and thus define these ahead of time. Similarly attempting to combine both GitLab scrape targets with their permutations plus the complexity of multiple GitLab projects, each having multiple environments, is a significant challenge.
Proposal
For these reasons we should support launching a project-specific Prometheus server automatically if it has k8s integration enabled. We can then use a simple ConfigMap to manage the configuration, and update it on demand. Any time it changes, we can simply send an HTTP request to re-read the configuration. This model has a number of benefits:
- The Prometheus server will be running where the environments are, allowing access to likely private scrape targets.
- Configuration complexity will be reduced, where we only need to worry about the scrape targets for this project alone.
- Fewer scalability challenges.
- Aligning to Prometheus best practices.
For now we can consider requiring direct network access from the GitLab server to the Prometheus server. In the future we should consider leveraging the ability to port forward to Kubernetes pods, to not require any external access in the event GitLab and the Prometheus server are not running in the same network segment.
- On the Kubernetes service success message, if the Prometheus service is inactive, add a link to encourage users to configure it
- Mockup: see mockup
- Text:
You’ve activated the Kubernetes service. To monitor the performance of your environments, [install Prometheus on Kubernetes] or [configure it manually]
- “install Prometheus on Kubernetes” starts the installation process and redirects the user to the Prometheus service page
- “configure it manually” simply redirects the user to the Prometheus service page
- On the environment monitoring “Get started” empty state:
- Mockup: see mockup
- Add an “Install Prometheus on Kubernetes” button that starts the “installation” process and redirects the user to the Prometheus service page
- Add a “Manually configure Prometheus” button that simply redirects the user to the Prometheus service page
- On the Prometheus service page:
- Mockup: see the Designs section below
- Move the current manual configuration settings into a “Manual configuration” section
- Add an “Auto configuration” section:
- In this section, have an “Install Prometheus on Kubernetes” button that starts the “installation” process
- Once the “installation” process begins:
- Replace that button with a disabled “Installing Prometheus…” button
- Add an info alert communicating that the Prometheus server is being installed
- Disable the “Manual configuration” section controls
- Once installed, replace the installation button with a “Uninstall Prometheus” button (stops the Prometheus container and cleans up the deployment, replica set, and other k8s settings we created)
- Once the “uninstallation” process begins, replace that button with a disabled “Uninstalling Prometheus…” button and add an info alert communicating that the Prometheus server is being uninstalled
For the configuration itself, since this is in k8s, we can leverage the standard Kubernetes Service Discovery present in Prometheus:
- Add a scrape target for annotated services or pods
- Collect node stats from cAdvisor
- In the future, we can consider allowing custom scrape targets as well
Notes:
- When enabling k8s metrics we should restrict collection to just the namespace for the specific project. (That is configured in the k8s settings)
Designs
- Kubernetes service success message
- Environment monitoring “Get started” empty state
- Prometheus service page:
Inactive | Installing… | Auto configured | Uninstalling… | Manually configured |
---|---|---|---|---|
![]() |
![]() |
Links / references
Documentation blurb
(Write the start of the documentation of this feature here, include:
- Why should someone use it; what's the underlying problem.
- What is the solution.
- How does someone use this
During implementation, this can then be copied and used as a starter for the documentation.)