Skip to content
Snippets Groups Projects

Bug Fix: use a regex to pull out the service and version in the splitServiceAndVersion method

Merged username-removed-831160 requested to merge craig6/gitlab-ci-multi-runner:master into master
1 unresolved thread

This merge request allows the use of multiple docker images (as a service) from a private docker registry where the docker registry is on a port other than 443.

Currently if you use a docker image from a private registry on a port other than 443 you can't have more than one service from that registry due to the way the service definition splits on the colon (:).

For example registry.example.com:5000/ruby:2.9 becomes registry.example.com as a host entry to connect to. If you then have registry.example.com:5000/mongo:2.6 this also becomes registry.example.com as an entry to connect to.

This ends up with:

registry.example.com -> ruby
registry.example.com -> mongo

Which of course doesn't work.

I have tried to take into account the different ways in which a private docker registry can be referenced in the regex and in the tests, but this could do with a review to see if my logic there covers everything.

I believe the docs cover the use of a private docker registry already and test have been updated.

(Note that the pipelines never pass the unit tests, but this is for the shell executor, on time outs, or when the test is killed by the runner.)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Maintainer

    Thanks for tackling this! I attempted to solve the same problem in !375 (closed), but I like the idea of just stripping out the hostname from the service name.

  • Trying to keep the changes and impact to a minimum. If you leave the hostname in then the documentation will need updating for this behaviour.

    As I see it this is a bug fix. Where as leaving the hostname in is a change in behaviour, which we can do in a later release along with updating the documentation.

  • Maintainer

    Oh, now I remember: The reason I left the hostname in the name is that it is the existing behavior to leave it. For example:

    registry.example.com/ruby:2.9 gets mapped to registry.example.com__ruby. Removing the hostname may break a number of CI builds.

    /cc: @ayufan, @tmaczukin, @nick.thomas

  • Added 1 commit:

    • e65768ec - use the docker regex to work out the service name, this keeps the hostname

    Compare with previous version

  • Added 1 commit:

    • a814ba76 - forgot to add the package statement

    Compare with previous version

  • Added 1 commit:

    • 25f2f33b - formatting

    Compare with previous version

  • Added 1 commit:

    • e7118779 - correct the regex

    Compare with previous version

  • Added 1 commit:

    • 4edd92e5 - correct the tests

    Compare with previous version

  • Added 1 commit:

    • b5f81d1c - improve tests

    Compare with previous version

  • Added 1 commit:

    • 3085309e - improve tests

    Compare with previous version

  • @stanhu What do you think of the latest merge request?

    I've taken the regex from the docker project as suggested, just an extra wrapper required to remove the port.

    Edited by username-removed-831160
  • Maintainer

    @nick.thomas Are we vendoring the Docker references files in !301 (merged)? Can you review this MR?

    In ideal world, we could just call the same methods that Docker is calling to parse these values, but I understand that would require copying in more code.

  • Yea I'm of the same thought process.

    I stuck with copying the docker regex code for now as I'm not sure on the implications of calling the docker code directly. I did look at implementing the same regex into the method, but the layers of the regex is relatively complex that in the end I copied it as at least the layers are readable.

    Happy to look at calling the docker code directly if someone can point me in the right direction.

  • @stanhu !301 (merged) vendors the docker code, but its not merged yet because docker hasn't made a stable release we can vendor. Relying on random commits in production with something as complex as docker is nasty.

    I'm happy with copy-paste for this, for now. We know where it comes from, and that it was copied unmodified; and it should be replaced with the vendored code once docker makes their next release.

  • Nick Thomas Resolved all discussions

    Resolved all discussions

  • @stanhu @craig6 initial review done.

    It looks like helpers/docker/auth_config.go:SplitDockerImageName in master already deals correctly with image names containing :, but it would be nice to get a second pair of eyes on that

  • Added 1 commit:

    • a93e916b - improve the code after suggestion from @nick.thomas

    Compare with previous version

  • Added 1 commit:

    • 99b121f2 - we already vendor this package so might as well reference and use it

    Compare with previous version

  • Thanks @craig6! This looks good to me now, perhaps you can take a look @tmaczukin ?

  • We'll need to rebase and squash commits before merging, 18 commit for 2 diffs so far :100:

  • Added 2 commits:

    • 4a8e3737 - use a regex to pull out the service and version in the splitServiceAndVersion method
    • 3ebf52e1 - we already vendor this package so might as well reference and use it

    Compare with previous version

  • Added 1 commit:

    • 184b21a1 - use a regex to pull out the service and version in the splitServiceAndVersion method

    Compare with previous version

  • @nick.thomas I've squashed the commits.

  • Is anyone able to test this works with an actual runner? I've tested the build on a runner and docker registry on a different port and it fails to download the docker image. The console shows it trying to pull the docker registry, then it times out talking to 443:

    Pulling docker image dockerregistry/piksel/dse:latest ...
    ERROR: Preparation failed: API error (500): unable to ping registry endpoint https://dockerregistry/v0/
    v2 ping attempt failed with error: Get https://dockerregistry/v2/: dial tcp 10.203.8.42:443: i/o timeout
     v1 ping attempt failed with error: Get https://dockerregistry/v1/_ping: dial tcp 10.203.8.42:443: i/o timeout

    I've confirmed that the service definition is dockerregistry:5000/piksel/dse:latest

    Any ideas?

  • @craig6:

    1. https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/184b21a1a78438165fb01bda7b6c473b67ba0fc3/executors/docker/executor_docker.go#L597 - here service definition is changed to service, version, linkNames
    2. https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/184b21a1a78438165fb01bda7b6c473b67ba0fc3/executors/docker/executor_docker.go#L607 - here service (so the one without port) is passed to s.createService()
    3. https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/184b21a1a78438165fb01bda7b6c473b67ba0fc3/executors/docker/executor_docker.go#L498 - here the provided service is used as part of s.getDockerImage()

    So if you have a definition domain.tld:5000/image/name then Runner is trying to download a domain.tld/image/name image.

  • @tmaczukin thanks, would you consider this a bug?

  • What do you mean in consider this a bug?? Current version of Runner is not supporting services from non standard registries. So no wonder then , that some more parts of the code need to be changed :smiley:

    I think it would be enough to also add description to createService call (next to currently used service and version), and then use this as image name for s.getDockerImage() call. It would be then something like:

    // executor/docker/executor_docker.go:491
    func (s *executor) createService(service, version, image string) (*docker.Container, error) {
    	if len(service) == 0 {
    		return nil, errors.New("invalid service name")
    	}
    
    	serviceImage, err := s.getDockerImage(image)
    	if err != nil {
    		return nil, err
    	}
            (...)
    }
    
    (...)
    
    // executor/docker/executor_docker.go:602
    
    		// Create service if not yet created
    		if container == nil {
    			container, err = s.createService(service, version, description)
    			if err != nil {
    				return
    			}
    			(...)
    }

    We currently enforce that the service description needs to be a valid docker image name so it should be OK to use it as image for s.getDockerImage(). And from the same image name we would discover service name, version and aliases used by Runner in other places.

  • I mean consider this a bug because in version 1.7.1 of the deb it manages to download the docker image from a private registry (and on a different port), in the 1.8-beta version I tested (because of this bug) it doesn't download. So there is a change in behaviour - here accidental or otherwise.

    If it's a desired change then it's not a bug, if something has changed that wasn't supposed to then you could argue it's introduced a bug. :-) Hence my question would you consider this a bug?.

  • @craig6 Look again on the code lines I've posted above. It's you, who has changed the behavior: because of change of how the new version of splitServiceAndVersion works and what it returns :wink:

    Before your change: a call splitServiceAndVersion("domain.tld:8080/namespace/service:version") generates two variables: service == "domain.tld" and version == "8080/namespace/service:version". Then in the s.getDockerImage() there are both concatenated with : between and used as an image name, so the s.getDockerImage("domain.tld:8080/namespace/service:version") call is done.

    Image domain.tld:8080/namespace/service:version is being downloaded as domain.tld:8080/namespace/service:version. However the service name and service alias are not working properly with registries with non-standard ports.

    After your change: a call splitServiceAndVersion("domain.tld:8080/namespace/service:version") generates two variables: service == "domain.tld/namespace/service" and version == "version". Then in the s.getDockerImage() there are both concatenated with : between and used as an image name, so the s.getDockerImage("domain.tld/namespace/service:version") call is done.

    Service name and service alias start to working with registries with non-standard ports. However image domain.tld:8080/namespace/service:version is being downloaded as domain.tld/namespace/service:version so it not works at all.

    That's why I proposed an additional change like in https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/merge_requests/376#note_18287633 :wink:

    Edited by Tomasz Maczukin
  • @craig6 This - 2bd5e031d7e997ee0cc7c425f5582c3f33ccf0f7 - resolves your problem and also adds a test to catch such regression in the future :)

    Feel free to cherry-pick it/rebase on top of it in your fork and include this into this MR :)

    Edited by Tomasz Maczukin
  • And as for the changes in splitServiceAndVersion - they look good. Let's fix the pulling problem and add tests and I think we can merge this :)

    Edited by Tomasz Maczukin
  • @tmaczukin light bulb moment, I had forgotten that splitServiceAndVersion did more than impact the hostname to connect to the service. Now that I think about it, that's why my original merge request was to remove the hostname and the port in what was returned. It appeared to be the least impacting fix.

    I'll cherry from your commit, add some tests and post back when all done.

  • Added 1 commit:

    • be01ea8e - pulling in work by @tmaczukin, this fixes an issue where the docker image is pul…

    Compare with previous version

  • Added 1 commit:

    • a291fb0a - formatting fixes, use the correct mock import

    Compare with previous version

  • @craig6 I left one small comment. Also please rebase on top of current master - it will make golint test pass.

  • Added 1 commit:

    • 84d73bef - lint - remove period on error message

    Compare with previous version

  • Added 1 commit:

    • cca15c19 - use a regex to pull out the service and version in the splitServiceAndVersion method

    Compare with previous version

  • Added 8 commits:

    • a5b5c4fb - use a regex to pull out the service and version in the splitServiceAndVersion method
    • 6507f29c - Fix broken links in docs/index.md
    • e11c711f - Bring old readme back
    • 3f0665b8 - Ensure that all builds are executed on tagged runners
    • ef051c53 - Revert "Merge branch 'bring-old-readme-back' into 'master'"
    • ce64ff65 - Fix docs links
    • ad0eada7 - Fix broken documentation links
    • 09a435e5 - Fix violations reported by new golint version

    Compare with previous version

  • Added 14 commits:

    • 09a435e5...5d239078 - 13 commits from branch gitlab-org:master
    • ca4c447a - use a regex to pull out the service and version in the splitServiceAndVersion method

    Compare with previous version

  • Sorry for the confusion there in the commits, merged when I meant to rebase.

    Should be sorted now.

  • Please retry the unittest build :)

  • Failed again :-(

  • Yes, unfortunately. This is also a known failure, not related with your changes. We're faithing with those in separate issues. Please retry again.

  • Mentioned in issue #1868 (closed)

  • Tomasz Maczukin Milestone changed to %v1.8

    Milestone changed to %v1.8

  • @craig6 The Packages build is failed but it's not important from testing point of view. I think we can merge this MR.

    Thanks very much for your work! :)

  • Tomasz Maczukin Status changed to merged

    Status changed to merged

  • Maintainer

    @craig6 Thanks for this! Glad to see this has finally been fixed.

  • It's all green now. I'm just going to test the package locally to see if I spot anything.

    Thanks @tmaczukin and @nick.thomas for your help.

  • Mentioned in issue #1434 (closed)

  • Tomasz Maczukin Mentioned in merge request !212 (closed)

    Mentioned in merge request !212 (closed)

  • Mentioned in issue #1828 (closed)

  • Tested package locally and so far all is good.

  • mentioned in issue #1666 (closed)

  • mentioned in issue #1657 (closed)

  • mentioned in issue #1284 (closed)

  • Please register or sign in to reply
    Loading