GitLab Geo (Read-Only secondary servers) (EE Option)
dev issue: https://dev.gitlab.org/gitlab/gitlabhq/issues/2359
We heard from many customers that they want geographically distributed GitLab. We always said that this is impossible because you can't write to multiple databases.
But for customers with geographic teams cloning everything over the WAN is a problem, especially for local CI servers. The WAN is slow and the data is costly. Some customers have 10,000 runners in operation that are cloning.
Gerrit has slaves, see 'Mirrors/Slaves' on https://code.google.com/p/gerrit/wiki/Scaling and http://comments.gmane.org/gmane.comp.version-control.repo/5295
We can do something similar where:
- GitLab master pushes all git updates to a number of GitLab slaves over ssh
- The PostgreSQL database is replicated to GitLab slaves
- You can start GitLab with a setting where it doesn't need to write to the database, only read, it will no accept any pushes, only pulls and clones.
last comment by @sytses:
It would be nice that if you push to GitLab RE it forwards your request to the master server.
Also consider reviewing Perforce Commit Edge https://www.perforce.com/perforce/doc.current/manuals/p4dist/chapter.distributed.html
cc @dzaporozhets @DouweM @jacobvosmaer @ayufan @jnijhof @marin
Geo decisions
- We support only PostgreSQL
- Avatar, LFS, builds artifacts, attachments will be solved either by CephFS or any opensource S3 alternative (this will be done after GA release)
- We are doing a simple hack with attachments and displaying them from primary until above is solved
- We moved to use SystemHooks for repository sync coordination (from buffered updates notification)
- Use SystemHooks for any missing coordination despite database replication
- What doesn't have SystemHooks should implemented as a SystemHook if make sense
- Advantages: minimal code difference between CE and EE, more people are using SystemHooks than custom mechanism
- Disadvantage: communication layer costs more (sidekiq job on every push multiplied by amount of secondary servers)
- We use SafeWebhooks implementation to validate Hooks from primary
- Authentication in secondary is done by OAuth protocol, authenticating against primary server (for web)
- For git you can use either username && password (https://) or SSH key (ssh://)
- When logging off secondary you will be logged of primary as well (Single Sign Out)
Features
-
Geo: Wiki Sync (gitlab-org/gitlab-ee#367) -
Geo: OAuth Authentication (gitlab-org/gitlab-ee#366) -
Geo: Documentation (gitlab-org/gitlab-ee#356) -
Geo: SSH keys Sync (gitlab-org/gitlab-ee#371) -
Geo: Display Attachments from Primary node (gitlab-org/gitlab-ee#414) -
Geo: Single Sign Out (gitlab-org/gitlab-ee#522) -
Geo: Monitoring (gitlab-org/gitlab-ee#727) -
Geo: Disaster Recovery (gitlab-org/gitlab-ee#846)
Bugs / improvements
-
Geo: Cannot delete secondary node if it's the only node present (gitlab-org/gitlab-ee#374) -
Geo: Improvements and fixes after QA (gitlab-org/gitlab-ee!354) -
Geo: Merge requests on Secondary should not check mergeable status (gitlab-org/gitlab-ee!366) -
Geo: Benchmark (#560 (closed)) -
Wiki page events webhook should include Wiki attributes (gitlab-org/gitlab-ce#17507) -
Omnibus tries to create Postgres extension on read only DB: (gitlab-org/gitlab-ee#628) (omnibus-gitlab!829 (merged)) -
Geo: The redirect URI included is not valid - OAuth (gitlab-org/gitlab-ee#650) (gitlab-org/gitlab-ee!444) -
Omnibus: manage custom SSL certificate (omnibus-gitlab#712 (closed)) -
Improve UI for users in a Geo node (gitlab-org/gitlab-ee#640) -
Improve gitlab:env:info
(gitlab-org/gitlab-ee!459) -
Geo: Move Wiki Sync to use SystemHooks (#1482 (closed)) -
Geo: Documentation improvements for 8.9 (gitlab-org/gitlab-ee!431) (Can wait) -
Improve required SSH Keys documentation for Geo (!431 (merged)) -
UI indication of the health status of Geo synchronization (can be part of: #727) -
Fix error in admin dashboard when Geo is enabled and current node is nil (#785 (closed)) -
Geo: when license doesn't include Geo you can't disable it anymore (#788 (closed)) -
Geo: improve project view UI to guide users how to clone/push from Geo secondary node (#789 (closed)) -
Geo: Replicate repository creation (#1071 (closed)) -
Geo: more documentation improvements for 8.13 (!766 (merged)) -
Geo: Display Custom Avatars (user, project and group) in secondary nodes (#1128 (closed)) -
Geo: repository is updated but displays old cached data in Web UI (#1129 (closed)) -
Geo: Backfill repositories from primary node without using rsync (#1190 (closed)) -
Ominibus - Geo: Generate SSH keys for gitlab user (omnibus-gitlab#1680 (closed)) -
Database Cache doesn't work as expected for Geo (gitlab-org/gitlab-ee#1217) -
Geo: Simplify known_hosts
step (gitlab-org/gitlab-ee#1255) -
Geo will not let you clone from Secondary on 8.13 (gitlab-org/gitlab-ee#1243) -
Geo: Improve Repository Sync (gitlab-org/gitlab-ee#1493) -
Geo synchronization for mirrored repositories (gitlab-org/gitlab-ee#1598) -
Geo: Improve info rake task and create geo specific check task (gitlab-org/gitlab-ee#1611) -
Geo: Backfill stopped working after 8.15.3 (gitlab-org/gitlab-ee#1645) -
Geo: Support v4 API for GitLab Geo endpoints (gitlab-org/gitlab-ee!1256)
Other ideas / discussions
- Geo: Hybrid synchronization (gitlab-org/gitlab-ee#623)