Fair usage and retention policy for CI
CI is part of quite a growth at GitLab.com. We have a number of issues that points to different parts of the story. One of such examples is https://gitlab.com/gitlab-com/infrastructure/issues/1279. It is not a big problem if we have some way of controlling it.
During a discussion with different people, we discussed different measures (ex. https://gitlab.com/gitlab-org/gitlab-ce/issues/23366) to have more controlled growth. The latest one is introducing Shared runner minutes
for Private projects which will be enabled quite soon.
This issue is to start a discussion on a more broader approach that would define our "limits" or fair usage for public, private free and private paid projects. Right now it seems that in most cases we do band-aid and then introduce some limits, or consider doing them. I'm thinking maybe there's a better way of doing that.
I did create this issue to consider that in a few months we will have a problem with Container Registry: disk usage and egress traffic. Having policies in place before it happens would make it easier to plan to engineer what needs to happen. This would also allow us to introduce soft limits (alerting) before actually implementing the hard limits.
Types of users to consider:
- Public: right now unlimited,
- Private free: will be limited to shared runners minutes,
- Private paid: will be probably by mostly unlimited.
Subsystems that makes the biggest difference for cost and scalability of the CI:
- Shared runners compute: this is already considered by having shared runner minutes,
- Artifacts: we will move that problem away by using object storage, but we do not yet limit unsustained growth. 9.0 introduces default expire data, we had some consideration for default for GitLab.com here: https://gitlab.com/gitlab-com/infrastructure/issues/1279#note_24869631,
- Container registry: currently unlimited, eats a lot of cost for storage and egress traffic. We are waiting to gather data about cost of running GitLab.com Container Registry now,
- Egress traffic from builds: we recently were hit by abuser who were generating a lot of outgoing traffic, with proper monitoring we could detect that, but maybe having more formalised way: You can use 10GB of egress for your builds (similar to shared runners minutes) would be interesting approach.
Example:
- Public: unlimited build minutes, 10GB of artifacts storage, 100GB of egress (maybe just alerting), 20GB of container registry storage,
- Private free: ...,
- Private paid: ....