gitlab_workhorse_secret location could be more robust
In gitlab-com/infrastructure#653 (closed), unicorn failed to start because it was unable to read /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret
. It turns out the mode was 0600 instead of 0644. When the Rails application starts, it creates the file if it does not exist. Somehow this happened on all but one worker (blessed worker). When upgrading the rest of the cluster workers to 8.13.1, we noticed that this was the state of the filesystem:
root@worker5:~# ls -al /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret
-rw------- 1 root root 44 Oct 25 20:08 /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret
root@worker5:~# ls -al /var/opt/gitlab/gitlab-rails/etc/gitlab_workhorse_secret
-rw-r--r-- 1 root root 45 Sep 15 14:30 /var/opt/gitlab/gitlab-rails/etc/gitlab_workhorse_secret
Which is not correct. This is the expected form:
root@worker1:~# ls -al /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret
lrwxrwxrwx 1 root root 56 Oct 25 23:46 /opt/gitlab/embedded/service/gitlab-rails/.gitlab_workhorse_secret -> /var/opt/gitlab/gitlab-rails/etc/gitlab_workhorse_secret
root@worker1:~# ls -al /var/opt/gitlab/gitlab-rails/etc/gitlab_workhorse_secret
-rw-r--r-- 1 root root 45 Sep 15 14:11 /var/opt/gitlab/gitlab-rails/etc/gitlab_workhorse_secret
How can this happen? This seems to suggest that Omnibus never ran or somehow skip over the generation of the file (https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/recipes/gitlab-rails.rb#L254-262). I wonder if the aborted connection during the deploy caused this to happen.
It seems to me that we would be better off setting an environment variable to contain the location of the Workhorse secret so that we do not depend upon a symlink being created.