Skip to content

Runner Autoscale through docker-machine

Kamil Trzcińśki requested to merge docker-machine into master

What is Docker Machine?

Machine lets you create Docker hosts on your computer, on cloud providers, and inside your own data center. It automatically creates hosts, installs Docker on them, then configures the docker client to talk to them. A “machine” is the combination of a Docker host and a configured client.

Source: https://docs.docker.com/machine/

Why Docker Machine?

Is easy to use. Is well documented. Is well supported and constantly extended. It supports almost any cloud provider or virtualization infrastructure. We need minimal amount of changes to support Docker Machine: machine enumeration and inspection. We don't need to implement any "cloud specific" features.

GitLab Runner and Docker Machine sitting on tree

This MR adds support for Docker Machine in GitLab Runner allowing to automatically spin-up Docker-enabled hosts on any provider supported by Docker Machine: DigitalOcean, VirtualBox, Azure...

This allows to have automatically scalable cluster of runners managed by one GitLab Runner instance. You can set to have a maximum of 100 jobs running in parallel. The Docker Machine integration will create Cloud Instances on demand to accommodate the load. After defined period (it can be 1h) the machines will be destroyed and a new one will be created when needed.

We use one GitLab Runner to manage all jobs.

How it works?

  1. The GitLab Runner requests the Job from coordinator
  2. The GitLab Runner checks available docker machines and takes the first one free
  3. If no machine is found a new one is created
  4. The Job is run on provisioned machine
  5. After the Job is run the disk space is checked and if it goes below certain level the machine is removed
  6. Otherwise the machine is added to list of machines that can be reused for other jobs

How to enable Docker Machine configuration

The simplest config.toml:

concurrent = 50

[[runners]]
  url = "https://gitlab.com/ci"
  token = "XYZ"
  name = "YZX"
  executor = "docker+machine"
  limit = 10
  [runners.docker]
    image = "ruby:2.1"
  [runners.machine]
    IdleCount = 5
    IdleTime = 600
    MaxBuilds = 100
    MachineDriver = "digitalocean"
    MachineName = "auto-scale-%s"
    MachineOptions = [
        "digitalocean-image=coreos-beta",
        "digitalocean-ssh-user=core", 
        "digitalocean-access-token=DO_ACCESS_TOKEN",,
        "digitalocean-region=nyc2",
        "digitalocean-size=4gb",
        "digitalocean-private-networking",
        "engine-registry-mirror=http://10.128.11.79:34723"
    ]

The above configuration will run up to 10 jobs and it will remove machines after 600 seconds of inactivity. It leave at most 5 nodes waiting to process the jobs. The machines are created on DigitalOcean, using CoreOS image. The boot time of CoreOS VM is between 40-60s.

Other MachineOptions:

docker-machine create -d digitalocean --help

Limitations

The runner cache is tied to one node. It can be solved by exposing cache as internal runner HTTP service.

No longer true: if this is enabled: https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/merge_requests/88

Considerations

The provisioned machines should not hold any data. The machines should be disposable if not needed. Having to provision a new node should not make a difference for a running a next stage build.

Merge request reports