Skip to content

Start Gitaly cluster components in parallel to the GitLab container to improve boot time

What does this MR do and why?

Part of https://gitlab.com/gitlab-org/gitlab-qa/-/issues/654

Currently when starting the Praefect/Gitaly scenario we run everything in series, so it takes several minutes to boot all the containers.
The praefect and gitaly containers each take approximately 50-60sec to start, so by running them in parallel there's scope to reduce the startup time by approximately 3-4 minutes.
Starting the gitlab container is the slowest part of the test startup, and remains the part of the startup that takes longer than any other.

  • gitaly1
  • gitaly2
  • gitaly3
  • postgres
  • praefect
  • gitlab

Attempt 1: This change will allow us to start the containers in parallel, which should reduce the overall time to setup the scenario.
When all containers have started, we then revisit the others to ensure they have 'reconfigured' before proceeding with the test.

Attempt 2: Ultimately, the very fastest boot time we can achieve is constrained by the length of time the longest container takes. The GitLab container takes around 5mins to boot, which is actually longer than the length of time to start all the other containers combined, so we can consider the idea of running GitLab on the main thread, but in a background thread, we can start the Gitaly Cluster containers. Running the Gitaly Cluster containers in series also means that we have at most 2 containers starting simultaneously which should avoid any excessive CPU/Memory/networking concerns or issues that we may have seen when taking the alternative approach of starting 6 containers parallel.

This approach means we can basically can shave off the length of time it takes to start gitaly1, gitaly2, gitaly3, praefect from the job as they happen in the background, while the main GitLab container starts, bringing us down to close to the shortest feasible boot time that is within our control.

Sample improvement

Individual Jobs

Recording from when we begin Running: bundle exec exe/gitlab-qa ${QA_SCENARIO:=Test::Instance::Image}, until we reach 'Running Gitaly Cluster specs!'

Job Start End Duration
example job prior to change 05:31:15 UTC 05:36:33 UTC 8m34s
example job with change 05:31:15 UTC 05:36:33 UTC 5m18s
3m16s (38% faster)

Combined Duration of 10 ee:praefect-parallel jobs

With update: https://gitlab.com/gitlab-org/gitlab-qa/-/pipelines/561932335 3:53:09
Prev version: https://gitlab.com/gitlab-org/gitlab-qa/-/pipelines/560413785 4:18:55
Improvement: 0:25:46 (9.95%)

How to set up and validate locally

  1. Run the Praefect/Gitaly scenarios gitlab-qa as normal and ensure that the orchestrated environment starts and tests work as before. ./exe/gitlab-qa Test::Integration::GitalyCluster EE --no-tests
  2. Add some longer sleeps to the code to verify that in the event that the Gitaly Cluster containers are slower than the main GitLab app, we don't encounter issues.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by George Koltsov

Merge request reports