Skip to content
Snippets Groups Projects
Commit c66f9343 authored by Marat Kalibekov's avatar Marat Kalibekov
Browse files

Update monitoring overview part

parent 4beba6c8
No related branches found
No related tags found
1 merge request!166Added gitlab-monitoring documentation
Loading
Loading
@@ -44,6 +44,7 @@ The aim of this project is to have a quick guide of what to do when an emergency
 
## Alerting and monitoring
 
* [GitLab monitoring overview](howto/monitoring-overview.md)
* [How to add alerts: Alerts manual](howto/alerts_manual.md)
* [How to silence alerts](howto/silence-alerts.md)
* [Alert for SSL certificate expiration](howto/alert-for-ssl-certificate-expiration.md)
Loading
Loading
<mxfile userAgent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/55.0.2883.87 Chrome/55.0.2883.87 Safari/537.36" version="6.0.2.7" editor="www.draw.io" type="device"><diagram name="Page-1">5Vtbc+MmGP01ntk+bEY3kPyYZLNpZ7YzmclDu49YwjJdLDwIJ3Z+fcFCN8CJ40h24+5DVvqEAJ1z+C6QTMLb5eaeo9XiT5ZhOgm8bDMJv02CwI8CKP9Tlm1lgYlXGXJOMt2oNTySF6yNdbM1yXDZaygYo4Ks+saUFQVORc+GOGfP/WZzRvujrlCOLcNjiqht/YtkYlFZE+C19t8xyRf1yL6nn8xQ+ivnbF3o8SZBON/9qx4vUd2Xbl8uUMaeO6bwbhLecsZEdbXc3GKqsK1hq977vudpM2+OC3HIC+qZeuMJ0TWup7ybmNjWYDwviMCPK5Sq+2dJ+CS8WYgllXe+vJwTSm8ZZXzXWn1ukKbSXgrOfuHOkwzOIIDyiR4Sc4E3e+ftN2hIlWG2xIJvZRP9AqgFpQUWTEF1/9zS1YC86FDVSAxpieRN3y1M8kIjtQfmT4paEB+Amj8aav4nRe2coIUOzCCVI9xk5Ele5urygauOF3hdqjeLUqBCAhh4ft1WjtJp/mXVtL/KiaBodpWy5W8WGdI5rdRluqWkyLCE9uZ9/GQAJ1nk4icJZiEciJ8Q9PgJfZsfP3HwkwyhaWChhjMZSvQt42LBclYgetdab3bxAasevD588oP59m9lv4rjoDb8VE+vPFDfP2BO5EQVH7sO/sFCbHUMRWvBpKkd9wdjq6b37FrFRsUoRWVJ0sr4ndB6AlI5XNiNduZOM7whopom0Hc/9VwqNBQEr5MpEWNrntattCOQw+RYN4NuzjmmSJCnfvcuBvWrD4zIgVutxH2xBFNDBNW09FutDiQmaNtptlINSkspzUQPEk/0kbUdvLm2vwaXsLpl1nW+5Q0+wlD4NkPhJTAEPe98DEGLIQkaZxs1vy+sUB9yWLB7H7LYl9jGLmSnMA7ROJlHA2IH2Ag6gIUDABtbwP5RyJgjQ4q0yrJvjgqkMF5hPmd8qTRf41tg8VF8M4STuTOzg2mCZ/NR8I1OCG9iwfuwnlEZbXvgLllBBOOfDljTJTiQdXqEIZCdWsheU/ktUqGyaufdrLjrlJFqY3gIl/e24G+TOf/d/nmepNhdv8wSEAFvKDL6XAQO9+wqlYcgoy6B9rPhiqim4zYzmQshJkgOYGY6FjN25LQgdFYMe2qEbhmztzzoFDgqdHcLnH3VzeS4iqLeNXy7ouiADRxY17YPFh6+Z+4iHFZ42B0Bo6PI6Kj6ZKujI+oT384BTioRYArkOC1AWwrxOaUQhEZad6wUTP/hjygFO18ZTAqvED70RkM9664WknNqAU6N1Wy69kO1EPtGR+F4WrAzrHsiflTRWY1YZPInk5Gb7wriOUcyjK5TseaqLC4xfyIpLm39UEpWpdogawpfytbZ+6vek6S4xtqDoZ3iQlfsHmRXOLDTqiOWY8EKfMxaDLtr8coHgb43Q3d329Br+KgP1ILjg/vUXsX+HrZOFN2NShKAY6O7cUIDvNGWcf2VI2oIJr6lIleO13HzHuyLy48OFVd9kjq60oJzKi00/A6Ij1RaZGQhIBhPaa5DrCGU1sgmjpK+bBK4RzavebhT+qvgrCoCRtoBAThORTAyOvKMjgZUkX1cco+FIEUujaofkto5hYzvoi+aflKgRdXNILQJUZIXSo242KnnRmULJEX0Wj9YkizbHee5spMBEozIONcErt1fV4IxRH5hH3xcNNZmSXZSrO2NmIvGGhobGI0vOgXW4+1ovJF/tHGnG3WqIHR4VPnP1K1REvZIPDYLAebCGy8JsXcwHrdFuuCsIC8SCrXlDNFSibyYla3WL2PRgbcPZJzF6hBrzt4u0MnRBeFrbb/Ep3Nq9RK6aIBD49dATgqwXSlfHsB+dEYF2wXiNyRQp4y6ltfG79nIH9cvuy3FyyHBPKv1fZsE54ngICzYBdb/k4XYO4CF0MGCued+AAvytv1t/yqtaf+kIrz7Fw==</diagram></mxfile>
\ No newline at end of file
Loading
Loading
@@ -2,10 +2,14 @@
 
### General overview
 
[Logical scheme](../img/gitlab-monitoring.png)
![Logical scheme](../img/gitlab-monitoring.png)
[draw.io source](../graphs/gitlab-monitoring.xml) for later modifications.
 
GitLab monitoring consist of the following parts:
 
1. 3 prometheus instances - 2 for HA, 1 for public monitoring. Each has role `prometheus-server` in chef, which specifies which metrics to collect.
1. 2 alertmanager instances - each of alertmanagers connected to corresponding prometheus instance and alert about availability of prometheus servers (each) and other other specified [alerting rules](https://dev.gitlab.org/cookbooks/runbooks/tree/master/alerts) (only on prometheus.gitlab.com). Effective roles in chef for alertmanagers are - `prometheus-alertmanager`, `prometheus-gitlab-com-monitoring`, prometheus-2-gitlab-com-monitoring`.
1. 2 alertmanager instances - each of alertmanagers connected to corresponding prometheus instance and alert about availability of prometheus servers (each) and other other specified [alerting rules](https://dev.gitlab.org/cookbooks/runbooks/tree/master/alerts) (only on prometheus.gitlab.com). Effective roles in chef for alertmanagers are - `prometheus-alertmanager`, `prometheus-gitlab-com-monitoring`, `prometheus-2-gitlab-com-monitoring`.
1. 1 haproxy instance - this is used for providing metrics for grafana in the case when one of the prometheus instances is down. Role in chef - `prometheus-haproxy`. So keeping prometheus instances collecting (scraping) metrics permanently is main thing to take care of.
1. 2 grafana instances - 1 for internal usage, 1 for public monitoring. Public grafana instance provides all dashboards tagged `public` from Internal one. (*TO BE COMPLETED HERE*)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment