Commit 27cfe0fb authored by Ernst van Nierop's avatar Ernst van Nierop
Browse files

Items moved to handbook, removing them here

parent 94d9d123
......@@ -2,34 +2,14 @@
[These are not the runbooks you are looking for](
## I'm here to see how the infrastructure is made
This project is really only used as an issue tracker for the
[infrastructure team]( at GitLab.
You should read the [production architecture]( then.
## But I'm here to see how the infrastructure is made
## Where and how to look for data
You should read the [production architecture]( then.
### General System Health
### Where and how to look for data
#### Blackbox Monitoring
* [GitLab Web Status]( front end perspective of GitLab. Useful to understand how looks from the user perspective. Use this graph to quickly troubleshoot what part of GitLab is slow.
* [GitLab Git Status]( front end perspective of GitLab ssh access.
#### Public Whitebox Monitoring
We offer a monitoring infrastructure site that is publicly accessible.
This monitoring site is updated hourly with any change we make in the private one, so it is a 1:1 copy of the private dashboards.
There are some metrics that are not visible in this public site because we do not keep a copy of metrics obtained through influxdb.
* [Fleet overview]( useful to see the fleet status from the inside of Use this graph to quickly see if the workers or the database are under heavy load, and to check load balancer bandwidth.
* [Postgres Stats]( useful to understand how is the database behaving in depth. Use this graph to review if we have spikes of exclusive locks, active or idle in transaction processes
* [Postgres Queries]( use this dashboard to understand if we have blocked or slow queries, dead tuples, etc.
* [Storage Stats]( use this dashboard to understand storage use and performance.
#### Private Whitebox Monitor
* [Host Stats]( useful to dive deep into a specific host to understand what is going on with it. Select a host from the dropdown on the top.
* [Business Stats]( shows many pushes, new repos and CI builds.
* [Daily overview]( shows endpoints with amount of calls and performance metrics. Useful to understand what is slow generally.
Take a look at information about our infrastructure monitoring in the
[infrastructure handbook](
# Production Architecture
## Production Architecture
Our core infrastructure is currently hosted on several cloud providers,
all with different functions. This document does not cover servers that
are not integral to the public facing operations of
This is what it looks like:
![Architecture](img/GitLab Infrastructure Architecture.png)
[Source](, GitLab internal use only
## Azure
The main portion of is hosted on Microsoft Azure. We have
the following servers there.
* 5 HAProxy load balancers for
* 2 HAProxy load balancers for GitLab Pages
* 2 HAProxy nodes for
* 22 front-end nodes of which:
* 4 are Web nodes
* 8 are API nodes
* 10 are Git nodes
* 10 Sidekiq nodes
* 4 PostgreSQL servers
* 5 Redis servers
* 3 Prometheus servers
* 5 NFS servers
Note that these numbers can fluctuate to adapt to the platform needs.
We also use availability sets to ensure that a minimum number of servers in each
group are available at any given time. This ensures that Azure will not reboot
all instances in the same availability set at the same time for anything that
is planned.
All our servers run the latest Ubuntu LTS unless there is a specific need to do otherwise. Every server is configured with a fully fledged set of firewall rules for increased security.
### Load Balancers
We utilize Azure load balancers in front of our HAProxy nodes. This allows us to leverage on the Azure infrastructure for HA as well as [taking advantage of the power of HAProxy](
Additionally, we utilize an Azure load balancer to manage PostgreSQL failovers.
* The load balancer pool serves git over ssh, git over https, http and https traffic.
* The GitLab Pages load balancer serves http and https.
* The AltSSH load balancer serves [git on port 443]( and translates it to port 22 on the back-end.
### Service nodes
Different services have different resource utilization patterns so we use a variety of instance types across our service nodes that are consistent for each group. We have recently isolated traffic by type on dedicated pools of nodes. We hope you noticed the performance improvement.
## Digital Ocean
Digital Ocean houses several servers that do not need to directly interact
with our main infrastructure. There are many of these that do a variety of
things, however not all will be listed here.
The primary things on Digital Ocean at this time are:
* Chef Configuration Management Servers
* Blackbox monitoring servers
* Shared runner managers
* Runner cache servers
* ELK servers
## AWS
We host our DNS with route53 and we have several EC2 instances for various
purposes. The servers you will interact with most are listed Below
* Version
* Mattermost
* License
## Google Cloud
We are currently investigating Google Cloud.
# Technology at GitLab
We use a lot of cool ([but boring](
technologies here at GitLab. Below is a non-exhaustive list of tech we use here.
* [Ruby]( (probably goes without saying)
* [Chef](
* [Prometheus](
* [PostgreSQL](
* [Redis](
* [ELK Stack](
* [Terraform](
* [Consul](
This page has been moved to the [infrastructure handbook](
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment