Skip to content

Fix concurrent access on builds/register

Kamil Trzcińśki requested to merge fix-builds-processing into master

What does this MR do?

This solves problems that lead to long build queues as described in these issues: https://gitlab.com/gitlab-com/infrastructure/issues/1242, https://gitlab.com/gitlab-com/infrastructure/issues/1244, https://gitlab.com/gitlab-com/infrastructure/issues/1238. We did discover that after looking at a number of requests for specific IP in last 15 minutes. We have seen a 409 Conflict to be something like 90% of requests. It seems that this happens due to latency inserted. Multiple runners, or even single runner, asking for the same set of builds does generate a request that only first one can be fulfilled. Since the operation of fetching build from the database is non-zero, it happens that we have multiple connections fighting for the same resource. The current implementation does return in that case 409, as we have seen. Just because we fire multiple requests, we are effectively fighting with ourselves for the resource (a job).

This Merge Request, instead of returning 409 after the first conflict, iterates the list to find the next match. The cost of iterating is much lower than the cost of returning and starting from scratch, again.

The expected result is to see a drop of 409 to something like 3-5%, instead of current 90%.

Are there points in the code the reviewer needs to double check?

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Merge request reports