Runner Autoscale through docker-machine
What is Docker Machine?
Machine lets you create Docker hosts on your computer, on cloud providers, and inside your own data center. It automatically creates hosts, installs Docker on them, then configures the docker client to talk to them. A “machine” is the combination of a Docker host and a configured client.
Source: https://docs.docker.com/machine/
Why Docker Machine?
Is easy to use. Is well documented. Is well supported and constantly extended. It supports almost any cloud provider or virtualization infrastructure. We need minimal amount of changes to support Docker Machine: machine enumeration and inspection. We don't need to implement any "cloud specific" features.
GitLab Runner and Docker Machine sitting on tree
This MR adds support for Docker Machine in GitLab Runner allowing to automatically spin-up Docker-enabled hosts on any provider supported by Docker Machine: DigitalOcean, VirtualBox, Azure...
This allows to have automatically scalable cluster of runners managed by one GitLab Runner instance. You can set to have a maximum of 100 jobs running in parallel. The Docker Machine integration will create Cloud Instances on demand to accommodate the load. After defined period (it can be 1h) the machines will be destroyed and a new one will be created when needed.
We use one GitLab Runner to manage all jobs.
How it works?
- The GitLab Runner requests the Job from coordinator
- The GitLab Runner checks available docker machines and takes the first one free
- If no machine is found a new one is created
- The Job is run on provisioned machine
- After the Job is run the disk space is checked and if it goes below certain level the machine is removed
- Otherwise the machine is added to list of machines that can be reused for other jobs
How to enable Docker Machine configuration
The simplest config.toml
:
concurrent = 50
[[runners]]
url = "https://gitlab.com/ci"
token = "XYZ"
name = "YZX"
executor = "docker+machine"
limit = 10
[runners.docker]
image = "ruby:2.1"
[runners.machine]
IdleCount = 5
IdleTime = 600
MaxBuilds = 100
MachineDriver = "digitalocean"
MachineName = "auto-scale-%s"
MachineOptions = [
"digitalocean-image=coreos-beta",
"digitalocean-ssh-user=core",
"digitalocean-access-token=DO_ACCESS_TOKEN",,
"digitalocean-region=nyc2",
"digitalocean-size=4gb",
"digitalocean-private-networking",
"engine-registry-mirror=http://10.128.11.79:34723"
]
The above configuration will run up to 10 jobs and it will remove machines after 600 seconds of inactivity. It leave at most 5 nodes waiting to process the jobs. The machines are created on DigitalOcean, using CoreOS image. The boot time of CoreOS VM is between 40-60s.
Other MachineOptions:
docker-machine create -d digitalocean --help
Limitations
The runner cache is tied to one node. It can be solved by exposing cache as internal runner HTTP service.
No longer true: if this is enabled: https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/merge_requests/88
Considerations
The provisioned machines should not hold any data. The machines should be disposable if not needed. Having to provision a new node should not make a difference for a running a next stage build.