Currently, running a build is a matter of generating a shell script and injecting it into a build context, where it is run by a chosen shell. There are a number of problems with this approach, primarily around detecting the shell to use, correctly generating the shell script itself, testing them, etc, and fundamentally, it mixes code with data in a hard-to-work-with manner.
I propose that we remove the shells entirely, and make it a requirement that either the gitlab-runner or gitlab-runner-helper binaries are in the build context. The build steps that are currently generated shell scripts can be turned into internal commands, in the same way that download cache or upload artifacts works today. Data can be injected into the build context in the same way the mixed data+code is injected now.
Restoring the strict separation between code and data will improve input validation, and gitlab-runner has a good cross-platform runtime embedded in it which can be used instead of the existing ShellWriter implementations, so extending the functionality of these scripts becomes easier.
Testing the scripts also becomes much easier, as the commands can be invoked directly with permutations of data, rather than a mixed data+code script file needing to be generated for each test. Coverage for this part of the runner's functionality is poor, so this would be welcomed.
It would still be different, because they still require some detection, and now putting requirement of having our own bash seems to be kind of problematic for simple cases. Ideally I would ditch having helper and would prefer to use standard tools for pretty much everything or download them when needed to reduce dependency on having the gitlab-runner-binary. So basically the other way around. You could pretty much assume that have standard set of tools (curl, zip or tar and equivalents on PowerShell). In my opinion having a depdenency on gitlab-runner-helper makes our lives much harder, especially trying to convience others about having to distribute that. Where all of that can be achieved with bash and powershell scripts. If we would ditch Windows Batch, this would make our live easier :)
I'm a developer who builds products that are supposed to build on Linux and Windows.
I have recently been severely affected by issues like #1523 (closed). I also have no idea how I'm going to build a single git repo with a single .gitlab-ci.yml file that will build on a Linux runner and a Windows runner.
I also feel a little nervous about things like #1523 (closed) being reintroduced because the shell integration for Windows runners appears to be somewhat a "rare case", and 90% of the Energy is going into Linux.
The architecture of the runner (generate and execute on cmd.exe on windows, generate and execute on bash on linux) is fundamentally difficult to trust for a developer like me, who cares actually more about Windows, but also cares about Linux a bit, and who actually has projects he wants to have Gitlab CI build for both Linux and Windows. I can see me having a Linux Runner Environment, and a Windows Runner Environment, and trying to build on both those places.
While this may not seem related to Nick's idea, I think my situation is related.
You see, I want to use $VAR on both windows and linux in gitlab_ci.yml, not $VAR on bash and %VAR% for environment variables. I fundamentally actually don't want cmd.exe and its glitches involved in gitlab in ANY way.
For a useful Gitlab runner on Windows, CMD.EXE should not be invoked, and no temporary batch file should be generated.
Instead a process should be launched and its output captured, and it should be monitored by a library perhaps like
os/exec direct, without any batch file. I would further like to have the Gitlab Runner watch my process, limit its total execution time, and perhaps its total memory, and if limits are exceeded, I'd like it killed.
Secondly, for a useful CI experience for projects that target Linux AND Windows, and should build on both, I want gitlab-ci YAML file to be ALWAYS using a POSIX-like syntax, and I want to have gitlab shell runners on both windows and linux run against the same git repo.
Also I would like to have some verbose logging for os/exec and output capture, when gitlab CI has an internal failure, so I can see exactly what it did when it falls over like #1523 (closed) and other similar issues.
Perhaps it seems to a Unix-centric user of Gitlab CI Multi Runner that invoking a Shell is better. To me, the fact that Gitlab CI is awesome but that the dependency on the cmd shell syntax on Windows is a liability. I actually don't want that.
So Nicks idea is a good idea, necessary because of multi platform concerns.
I've been contemplating making a windows-shell-posix-prototype using a fork of the main repo, if anyone is curious what I mean, and thinks it's a good idea, or thinks it's a bad idea, let me know.
Our main problem right now is not script generation, is basically a lack of single point of truth. We do have built-in integration tests, but they are only executed on Linux boxes. Because of that we don't reliably test all scenarios that can be used by our users and problems like this get introduced. Also you are correct that my main focus is not Windows, because we on our daily basis uses Runner with Docker on Linux. This means that it's harder to catch such problems by us. I still stay strong on opinion that this kind of problems should not happen, but the truth is that they will. We can only invest our time to try to make sure they do not happen in the future. So I would say. It seems tempting to rewrite everything :) but probably right now the better way is to invest in something that we will need anyway - integration tests running continously in our development cycle on all supported platforms trying to cover as many possible workflows. Only then we will actually see where we should spend more time to improve and possibly also fix more of this kind of problems.
I like any proposal, because it allows me to learn about problems and ideas how to solve them. However before making such changes, I also want to have educated and data driven view why this is better then current solution and at this point I don't see it yet :)
What I can say that we will spend time to make sure that this errors are solved and that integration testing is in the place to make sure that we will not do this mistakes again.
I would love to help you fix that test problem. I want Gitlab CI Runner to be trust-worthy on Windows boxen. Also easier to install Windows CI Runners, but for that, I'm planning to contribute a Windows installer, unless you guys already have that in the works.
I see in the docs that "Bash" IS supported by gitlab-ci-runner. How do I use the Bash shell on Windows, with gitlab-ci-multi-runner? I can't find the docs.
Using CMD as the default on Windows sucks. Even changing the DEFAULT to powershell would be a huge step forward. As a windows expert (25 years), in my humble and nuanced opinion Windows BATCH files are a piece of and nobody should be using them. Certainly having your tool generate an execute .BAT/.CMD syntax is awful. Powershell would be a better default, next time you can make such a breaking change (say, 2.0?). All Windows server and client systems from Windows 7 and newer have PowerShell already.
@ayufan@warren.postma I'm starting to throw some proof-of-concept together in !317 (closed). Not feature-complete yet, but the direction is clear, I hope. Code speaks louder than words ;).
Having both shells/abstract_shell.go and commands/helpers/native_shell/*.go at the same time duplicates logic, which makes future change harder, so overall I think I'd rather have one approach or the other, rather than both.
Dynamically generating scripts with interpolated data that need to work in a wide range of contexts is a source of many of our current problems. It's an inherently error-prone process. Having integration tests would let us catch these errors more frequently, but they'd still be occurring at a high rate. To me, this approach eliminates whole classes of error entirely, and makes the remainder easier to test and reason about.
Continuing to require an agent (gitlab-runner-helper) in some environments isn't a significant cost, and it would become almost free if we automatically injected it into those environments in the same way we currently inject the build scripts (or config, with this MR), instead of requiring users to install it.
I'm actually writing my own "tooling injection" scripts in python right now, and I have a bit of a bootstrap problem since I can't require a shell to exist, for example. I can do this to require sencha tooling:
But there is no way in .gitlab-ci.yml to say "this .gitlab-ci.yml needs bash", or it needs cmd, or powershell. It's just up to the person setting up the Gitlab environment to make a choice and stick with that one choice forever, or else face the pain and suffering of a mixed shell environment.
@warren.postma it wouldn't be too difficult to have .gitlab-ci.yml specify a particular script type to use. Might be worth opening a separate issue for that one though.
That would be useful. But I've been thinking about it, and perhaps I might want ONE git repo to be buildable on TWO or more different shells and platforms. What then?
In such a case, I would like a .gitlab-ci-bash.yml and a .gitlab-ci-cmd.yml, and then my builder would find the correct commands instead of the other way around.
Imagine some day I even want to have a "mac build", a "Linux build" and a "windows build" for ONE project, how can I even do that in gitlab right now since Bash is usable on two and Cmd is usable on a third?
I have this crazy idea that I'm trying internally where I create a "call.sh" which I place in the path, on Linux and Mac boxen, and I restrict my .gitlab-ci.yml to things in this form:
@warren.postma when I say "to have .gitlab-ci.yml specifiy a particular script type to use", I mean that the specification could be per-job. So you'd have something like this:
@nick.thomas : That would be great! I would love to have jobs like "test_chrome_linux" that run on bash, on linux runners, and "test_safari_mac" that runs on bash, on mac os runners.