Add `parallel` keyword to split CI tests
Description
We have split gitlab-ce
's tests into multiple parallel jobs running substantially the same scripts which differ only by a loop index. Let's formalize this approach and create a parallel
keyword which takes a number, N, and duplicates a job N times while setting CI_NODE_INDEX
and CI_NODE_TOTAL
for each job.
Proposal
Given:
rspec:
stage: test
parallel: 20
script:
- export KNAPSACK_REPORT_PATH=knapsack/rspec_node_${CI_NODE_INDEX}_${CI_NODE_TOTAL}_report.json
- cp knapsack/rspec_report.json ${KNAPSACK_REPORT_PATH}
- knapsack rspec
Generate 20 jobs named rspec 1/20
through rspec 20/20
. (I prefer indexing from 1 for human-named items.). Each job would have a unique CI_NODE_INDEX
and CI_NODE_TOTAL
would be set to 20
. This would be handled at the parser level so GitLab runner wouldn't require any changes.
Note that .gitlab-ci.yml
would support multiple definitions for parallel jobs (e.g. rspec
and spinach
) in the same script, and the CI_NODE_INDEX
variables would only be unique within each definition. e.g. there would be two jobs running with CI_NODE_INDEX=1
.
Links
- This is a specific proposal of the general, larger issue of automatic parallelization (#3819 (closed)).
- Works well with #21286 (closed).