Degraded Performance on GitLab.com 2016-08-12

At around 10:40 CDT (16:40 UTC) alerts began to go off indicating that 15 of the worker nodes' sidekiq processes had gone away. After a long investigation process, it became evident that the 15 workers could not access Postgres via the Azure load balancer IP. In order to work around this issue, we updated the Chef attributes to set the GitLab to talk directly to DB4. This resolved the degraded performance.

We currently have a ticket open with Azure to troubleshoot why the load balancer is dropping traffic from some worker nodes and not others.

The workers that CAN access the database via the load balancer ip are:

worker3
worker10
worker11
worker12
worker13

The workers that CANNOT access the database via the load balancer are:

worker1
worker2
worker4
worker5
worker6
worker7
worker8
worker9
worker14
worker15
worker16
worker17
worker18
worker19
worker20

Admin message

Admin message

Degraded Performance on GitLab.com 2016-08-12