After a long battle I managed to login and push a docker image to the staging registry. Proof here: https://gitlab.com/snippets/1666801
I have also disabled registry from all nodes except for registry01.be.stg.gitlab.com.
For some reason when I use the internal lb instead of the registry node in the registry backend of the frontend lb the upload fails saying "unknown blob" after it finishes the upload. With the node in the backend everything works great.
Next steps:
Fix the upload issue to use the internal lb as the backend in the frontend lb.
Ensure the keypair is consistent across rails and registry nodes.
Settle down the configuration in Chef and ensure it configures things properly.
Get some help to extensively test registry operations.
After all this we can close this issue and move on to the next step: doing this in production.
For some reason when I use the internal lb instead of the registry node in the registry backend of the frontend lb the upload fails saying "unknown blob" after it finishes the upload
may want to check: inconsistent view of the filesystem mounts or UID setting on the different nodes?
Finally, I'm having a couple of more issues setting the certificate with Chef. For some reason the certificate isn't the one I have in vault. If I set that manually then everything works just fine.
Registry is fully operational and scalable in staging.
I created two registry nodes to test failover. They both came online with the right certificate and configuration and started to serve requests as soon as chef-client finished. I tried to start an upload and kill the service on the node I was connected: the docker client waits for a few seconds and then reconnects to the other node, resuming the upload.
We're running more tests on this but I think we can close this now. Next step: production!