When companies have many projects, they often have standardized testing and deploying processes, which are often reflected in .gitlab-ci.yml. We should make it easy to share these processes and DRY up CI/CD configuration by letting one project import/include from another project's repo; either its .gitlab-ci.yml or another specified file. By allowing multiple imports, configuration could be componentized with a mix-and-max approach to constructing .gitlab-ci.yml content.
Could be an EE feature, targeted at larger organizations.
Proposal
Add keyword include which contains a string or array of strings.
Use a fully-defined pipeline from another project:
Using predefined templates seems like the recommended best practice, but the first way is a great way to get started, or use if a company has a rigid reusable process. It's comparable to including all of Twitter bootstrap vs including only the components you use.
This last option might seem easier because you can ignore YAML templates, which are pretty awkward. But you quickly end up including the same file multiple times, buried deep in the config file, and that feels like an anti-pattern. Dropping this scope may also ensure that each included YAML file can be fully validated on its own.
How would you know which branch / tag of the file to include? It's tricky since in some cases you might want settings in the master to automatically change global ci settings, and in other cases you might want projects to have to opt-in to a global ci change update.
I'd be fine only supporting inclusion of master/head; at least to start. But I could also see including any arbitrary URL. The trick there is permissions.
Another alternative would be to allow including a YAML file from the filesystem instead of directly from a url. That way we could checkout whatever branch we want (or even a tag). I have to do quite a lot of this (checking out other projects) in our builds anyway. In fact, part of our build clones project that just contains bash scripts that, among other functions, determines the best branch to use if a dependent project doesn't have the same branch name.
I would envision this to work like:
git clone -b develop <my_config_repo> ../my-config-project-dir
and in the .gitlab-ci.yml:
!include ../my-config-project-dir/build-defs/build-config-1.yml
I'm kinda against including local filesystem, mostly because it's outside of version control, but also because it can't be parsed until a job is running, which limits the types of things you could include. e.g. you could never specify only in a local file because that is evaluated on the main GitLab process before the runner sees the job.
But it brings up another interesting topic which is shared scripts that really simplify .gitlab-ci.yml. We've tended to have things like prepare_ci.sh checked into a project's repo and I really wish those things could be shared somehow. Git submodules come to mind, but I still wish there was a better way to compartmentalize CI configuration separately from a project's code. Plugins are the current hope there.
Yep, that issue is basically the same request. Should probably move that to this repo though, since it's not just a runner issue.
Perhaps an alternative solution is (for users who want it) to have your .gitlab-ci.yml hosted in a different project altogether? So in your project settings you simply choose a separate project (and branch / tag) for your config file rather than pull it from the local repo?
This could perhaps be combined with including other snippets from the local fs / local repo, to pickup project specific settings?
I strongly disagree with breaking YAML to support this. In my opinion, there is no reason that the YAML spec needs to be broken to provide this feature. The suggestions in gitlab-org/gitlab-ci-multi-runner#1258 all maintain YAML parsability meaning that it can have its core syntax validated by local tools, editors with code colouring won't break and IDEs that do yaml parsing also won't break (unless you use a map-merge with an alias from your import - see my next comment on gitlab-org/gitlab-ci-multi-runner#1258).
This is greatly simplified now that pipeline runs have permissions based on the user that triggered them. This means reading another repo's configuration is reasonable as long as permissions are managed well on the projects.
@sspreitzer No, no one is working on it. I just marked it up-for-grabs if someone (you?) wants to pick it up.
And yeah, that includes syntax looks good. We can start by implementing it only at the top level and see how far that gets us. Good use of YAML templates should make the "Mix-and-match raw" version unnecessary.
If the community wants to contribute this, we'll consider it for CE. Otherwise there's some discussion to make it EE only as it's not totally necessary, but a convenience.
I need to look into it a little more still, I've been focusing on another merge request I have open. Fetching the files (at least by url) should be relatively easy, and getting the files from a branch/repo shouldn't be too bad.
The tricky part here is actually merging templates in the yaml files. Some of that is a feature of yaml. It's much easier to source the files and do a hash merge, but that precludes using the built in yaml templating. I'm going to dig around and see how other people solve this. Maybe RailsConfig handles this? Not sure if they actually merge files or if they do hash merge.
From an implementation standpoint, it looks like this will require changes in BOTH gitlab and in the gitlab-ci-multi-runner. (Someone please correct me if I'm wrong.) I'm going to do some more digging into that project and see if I can come up with an approach.
includes:arbitrary_g_name:path:group/repository/directory/file.exttag:latestarbitrary_u_name:path:user/repository/directory/file.ext# this would hit masterarbitrary_site_name:url:https://server/url/raw
@alexives I've been working on an external tool to implement this feature (as mentioned in https://gitlab.com/gitlab-org/gitlab-ce/issues/29042#note_31050552), and have run into some of the same issues that I'm guessing you've seen. There's an aspect to this that I think needs considered - especially in the context of the enterprise use perhaps: the ability to review the generated CI pipeline before submitting it. The tool I've been working on assembles the .gitlab-ci.yml file pre-commit and allows for such a review process before being dispatched to the CI workers.
I think there are definitely some issues which can be hit with a tool that runs completely out of band, namely: that people need to remember to run it. I also think there are some benefits to being able to review the result of a tool which will control the process of building and potentially be deploying the code.
Unfortunately, if I am not mistaken the & and * syntax is handled by the YAML parser and so if this is implemented based on parsed YAML (doesn't make sense to tinker with the YAML parser in ruby), then references between files using that syntax would not be possible. Would be really nice I agree, but the filter and merge options I have in my proposal would hopefully allow you to achieve the same things.
I definitely don't have time right now, but if this is still unimplemented in a couple of months, I'll see if I can make some time to have a look at this and possibly help with getting this as a CE feature.
My general thoughts about the process at the moment (having not dug into the code for this yet):
Parse the .gitlab-ci.yml in the project.
Inspect the includes, check permissions and fetch the required files for processing
Do these two steps recursively on a depth-first basis (to allow includes to include other files) with permissions always being based on the project that is being built.
As the recursion pops back perform a merge of the included file with its parent using the rules specified within the include in order to build an "effective .gitlab-ci.yml"
Cache the "effective .gitlab-ci.yml" for each commit.
When running a job, provide this "effective .gitlab-ci.yml" to the
With caching it would be nice to be able to view the "effective .gitlab-ci.yml" from the web UI and also provide an option to trigger a rebuild of this.
I like the idea of the mix-and-max approach. But in some cases, it would also make sense to remove or override details. We, for example, have tonnes of projects and now we evaluated a software to analyze code. It would be pretty cool to add a new stage. So that code analysis stage will be placed in the parent and any child will automatically get the new stage as well.
But some of the projects don't need this stage because the analyzer tool is not for the language.
Some projects simply don't need to be analyzed.
A feature like overriding, known from programming languages, would solve the problem.
Overriding:
The result would be: ONE stage has passed. Recognized by the developer with the default naming - but different implementation.
Removing:
The result would be: NO stage for analyzing has passed. Or will be marked as removed.
Our security team would like to have this feature to provide code scanning (for vulnerabilities, plain text passwords, etc) for the CI pipeline. It would additional need to include either force injecting a job or being able to run a report to get which repositories do not include this code scanning job.
The problem Jendrik points out about the tool not being for the language in the repository seems complicated as well. Could there perhaps be a way to add jobs to the pipeline from within a job?
It could perhaps behave like this then:
scan for general bad practices - plain text password in code
add jobs for identified scenarios - if Ruby, add Ruby code scanner to pipeline
I would like to offer my use case for consideration, for which I'd also need some kind of include statement.
Currently, we are maintaining a separate repository of build scripts (not .gitlab-ci.yml snippets!) and include it in our projects as git submodules at .gitlab-ci/. So our main scripts look a bit like this:
The advantage is that we can avoid cramping fully fledged bash scripts into YAML and also do a lot of additional logic which is difficult to replicate in every single project we have. For example, our docker build includes advanced Docker caching, like load/save and --cache-from and also failure handling. We are very averse to copy-and-pasting, which we already do a lot with Gitlab CI (YAML snippets would be nice to avoid it).
Our current struggle is a relatively long repository clone process in every build step. I would like to use GIT_STRATEGY: none for several build steps, however this would mean that my build scripts would need to migrate back to .gitlab-ci.yml, since there might be no .gitlab-ci/ to source from.
A proper include statement would generate an "effective .gitlab-ci.yml" as mentioned before, which would work perfectly even without a clone. Personally, I'd be fine with only including from the local repository since I can handle every other case with submodules or subtrees and still keep the pipeline simple enough. There is AFAIK no known workaround for building a .gitlab-ci.yml from snippets and scripts without a current repository clone.