Update monitoring overview part

c66f9343 · Marat Kalibekov · 4beba6c8 · c66f9343 · c66f9343 · c66f9343
Commit c66f9343 authored 8 years ago by Marat Kalibekov
--- a/README.md
+++ b/README.md
@@ -44,6 +44,7 @@ The aim of this project is to have a quick guide of what to do when an emergency
  
 ## Alerting and monitoring
  
+* [GitLab monitoring overview](howto/monitoring-overview.md)
 * [How to add alerts: Alerts manual](howto/alerts_manual.md)
 * [How to silence alerts](howto/silence-alerts.md)
 * [Alert for SSL certificate expiration](howto/alert-for-ssl-certificate-expiration.md)

--- a/graphs/gitlab-monitoring.xml
+++ b/graphs/gitlab-monitoring.xml
+<mxfile userAgent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/55.0.2883.87 Chrome/55.0.2883.87 Safari/537.36" version="6.0.2.7" editor="www.draw.io" type="device"><diagram name="Page-1">5Vtbc+MmGP01ntk+bEY3kPyYZLNpZ7YzmclDu49YwjJdLDwIJ3Z+fcFCN8CJ40h24+5DVvqEAJ1z+C6QTMLb5eaeo9XiT5ZhOgm8bDMJv02CwI8CKP9Tlm1lgYlXGXJOMt2oNTySF6yNdbM1yXDZaygYo4Ks+saUFQVORc+GOGfP/WZzRvujrlCOLcNjiqht/YtkYlFZE+C19t8xyRf1yL6nn8xQ+ivnbF3o8SZBON/9qx4vUd2Xbl8uUMaeO6bwbhLecsZEdbXc3GKqsK1hq977vudpM2+OC3HIC+qZeuMJ0TWup7ybmNjWYDwviMCPK5Sq+2dJ+CS8WYgllXe+vJwTSm8ZZXzXWn1ukKbSXgrOfuHOkwzOIIDyiR4Sc4E3e+ftN2hIlWG2xIJvZRP9AqgFpQUWTEF1/9zS1YC86FDVSAxpieRN3y1M8kIjtQfmT4paEB+Amj8aav4nRe2coIUOzCCVI9xk5Ele5urygauOF3hdqjeLUqBCAhh4ft1WjtJp/mXVtL/KiaBodpWy5W8WGdI5rdRluqWkyLCE9uZ9/GQAJ1nk4icJZiEciJ8Q9PgJfZsfP3HwkwyhaWChhjMZSvQt42LBclYgetdab3bxAasevD588oP59m9lv4rjoDb8VE+vPFDfP2BO5EQVH7sO/sFCbHUMRWvBpKkd9wdjq6b37FrFRsUoRWVJ0sr4ndB6AlI5XNiNduZOM7whopom0Hc/9VwqNBQEr5MpEWNrntattCOQw+RYN4NuzjmmSJCnfvcuBvWrD4zIgVutxH2xBFNDBNW09FutDiQmaNtptlINSkspzUQPEk/0kbUdvLm2vwaXsLpl1nW+5Q0+wlD4NkPhJTAEPe98DEGLIQkaZxs1vy+sUB9yWLB7H7LYl9jGLmSnMA7ROJlHA2IH2Ag6gIUDABtbwP5RyJgjQ4q0yrJvjgqkMF5hPmd8qTRf41tg8VF8M4STuTOzg2mCZ/NR8I1OCG9iwfuwnlEZbXvgLllBBOOfDljTJTiQdXqEIZCdWsheU/ktUqGyaufdrLjrlJFqY3gIl/e24G+TOf/d/nmepNhdv8wSEAFvKDL6XAQO9+wqlYcgoy6B9rPhiqim4zYzmQshJkgOYGY6FjN25LQgdFYMe2qEbhmztzzoFDgqdHcLnH3VzeS4iqLeNXy7ouiADRxY17YPFh6+Z+4iHFZ42B0Bo6PI6Kj6ZKujI+oT384BTioRYArkOC1AWwrxOaUQhEZad6wUTP/hjygFO18ZTAqvED70RkM9664WknNqAU6N1Wy69kO1EPtGR+F4WrAzrHsiflTRWY1YZPInk5Gb7wriOUcyjK5TseaqLC4xfyIpLm39UEpWpdogawpfytbZ+6vek6S4xtqDoZ3iQlfsHmRXOLDTqiOWY8EKfMxaDLtr8coHgb43Q3d329Br+KgP1ILjg/vUXsX+HrZOFN2NShKAY6O7cUIDvNGWcf2VI2oIJr6lIleO13HzHuyLy48OFVd9kjq60oJzKi00/A6Ij1RaZGQhIBhPaa5DrCGU1sgmjpK+bBK4RzavebhT+qvgrCoCRtoBAThORTAyOvKMjgZUkX1cco+FIEUujaofkto5hYzvoi+aflKgRdXNILQJUZIXSo242KnnRmULJEX0Wj9YkizbHee5spMBEozIONcErt1fV4IxRH5hH3xcNNZmSXZSrO2NmIvGGhobGI0vOgXW4+1ovJF/tHGnG3WqIHR4VPnP1K1REvZIPDYLAebCGy8JsXcwHrdFuuCsIC8SCrXlDNFSibyYla3WL2PRgbcPZJzF6hBrzt4u0MnRBeFrbb/Ep3Nq9RK6aIBD49dATgqwXSlfHsB+dEYF2wXiNyRQp4y6ltfG79nIH9cvuy3FyyHBPKv1fZsE54ngICzYBdb/k4XYO4CF0MGCued+AAvytv1t/yqtaf+kIrz7Fw==</diagram></mxfile>
\ No newline at end of file
--- a/howto/monitoring-overview.md
+++ b/howto/monitoring-overview.md
@@ -2,10 +2,14 @@
  
 ### General overview
  
-[Logical scheme](../img/gitlab-monitoring.png)
+![Logical scheme](../img/gitlab-monitoring.png)
+
+[draw.io source](../graphs/gitlab-monitoring.xml) for later modifications.
+
  
 GitLab monitoring consist of the following parts:
  
 1. 3 prometheus instances - 2 for HA, 1 for public monitoring. Each has role `prometheus-server` in chef, which specifies which metrics to collect.
-1. 2 alertmanager instances - each of alertmanagers connected to corresponding prometheus instance and alert about availability of prometheus servers (each) and other other specified [alerting rules](https://dev.gitlab.org/cookbooks/runbooks/tree/master/alerts) (only on prometheus.gitlab.com). Effective roles in chef for alertmanagers are - `prometheus-alertmanager`, `prometheus-gitlab-com-monitoring`, prometheus-2-gitlab-com-monitoring`.
+1. 2 alertmanager instances - each of alertmanagers connected to corresponding prometheus instance and alert about availability of prometheus servers (each) and other other specified [alerting rules](https://dev.gitlab.org/cookbooks/runbooks/tree/master/alerts) (only on prometheus.gitlab.com). Effective roles in chef for alertmanagers are - `prometheus-alertmanager`, `prometheus-gitlab-com-monitoring`, `prometheus-2-gitlab-com-monitoring`.
+1. 1 haproxy instance - this is used for providing metrics for grafana in the case when one of the prometheus instances is down. Role in chef - `prometheus-haproxy`. So keeping prometheus instances collecting (scraping) metrics permanently is main thing to take care of.
 1. 2 grafana instances - 1 for internal usage, 1 for public monitoring. Public grafana instance provides all dashboards tagged `public` from Internal one. (*TO BE COMPLETED HERE*)