Now that we have the ability to collect all of our dependencies and their licenses into one file, we should then check this file to ensure there are no problematic licenses. This check should be a CI test, which fails if problematic licenses were found.
This would reduce the manual effort required by the build team, as well as return valuable feedback to the developer as soon as they make their first commit with a new dependency. They can then try to resolve the license issue or go down a different path.
What we should do next is improve this check for it to go through the newly created dependency_licenses.json file. Further, we should create a whitelist/blacklist for both licenses and individual software. When we do that, we can abort the CI run.
Thanks @balasankarc for working on this already. To make sure I understand, that MR will be checking the JSON file for licenses and then failing the CI test if we find a license we cannot use?
@balasankarc@joshlambert We need a more flexible logic than that. We should have blacklist whitelist per license but we should also have a software whitelist/blacklist which will override the former.
@marin Do you have an example for these whitelist/blacklists? I assume the first one is similar to the one we already have. "Licenses X, Y and Z are blacklisted and License M, N and O are whitelisted". What does the second list ("software whiltelist/blacklist") do? A blacklisted software, even if it has a good license should abort the build ?
@ibaum I believe our initial reasoning was that it is upstream's duty to ensure that their products are license compliant - https://gitlab.com/gitlab-org/omnibus-gitlab/issues/1904#note_23263249 . We may be able to figure out a logic to iteratively run license_finder for all our dependencies, but I am not sure if we should be doing it or upstreams.
My honest opinion is that we can ask upstream to include a csv file whenever they release a new version. GitLab CE/EE already does that and so other teams also can. It is way easier than us writing more and more code (which we will have to maintain) on our side. @gitlab-build-team Opinions?
What does the second list ("software whiltelist/blacklist") do? A blacklisted software, even if it has a good license should abort the build ?
Correct. Similarly, a whitelisted software does not abort the run even if it has a license on the blacklist. Example, we want to ship git which is GPL and is on the blacklist, but because it has a clause in the license we can.
@joshlambert can we maybe ship this as a feature in GitLab? Many enterprises would love to have a license check, even when we can't ship a perfect first iteration of this.
Initial MR for using dependency_licenses.json is in. Next iteration is possibly aborting the build on bad licenses/softwares. I've opened a WIP MR at !1599 (merged) (well, it is unoptimized, crude code now). The logic follows.
@marin Am I right about [3,3] position? Or should that also abort a build?
Update: Checking the latest EE nightly, these are the offenses I got. All of them have unknown license. I assume them to be NPM libraries that frontend use and license_finder failed on them somehow.