Outage October 25th 2016, 12:30 am UTC
At around 12:20 am UTC I deployed 8.13.1 to production. After the deploy was done, there was an outage of around 6 minutes.
Notes:
- The blessed deploy task ran successfully.
- I ran the cluster deploy task twice, because the first one ended with a network error because of failures on my internet connection. The second attempt ran successfully.
- During the outage @stanhu noted that there were multiple secret files with bad permissions:
root@worker5:/var/log/gitlab/unicorn# sudo ls -al /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret
-rw------- 1 root root 44 Oct 25 20:08 /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret
root@worker5:/var/log/gitlab/unicorn# sudo ls -al /var/opt/gitlab/gitlab-rails/etc/gitlab_workhorse_secret
-rw-r--r-- 1 root root 45 Sep 15 14:30 /var/opt/gitlab/gitlab-rails/etc/gitlab_workhorse_secret
- @stanhu fixed the permissions on these files and service was restored at around 12:33 am UTC
/cc @pcarranza @northrup