Add index for ci_builds that makes shared runners query to run 100x faster.
What does this MR do?
This reduces the time for run SQL query for Shared Runners by over 100x.
We reduced the number of times that this query is executed (by checking a Redis value), but it did not change the performance of it when it needs to be.
By adding a simple index:
- On
staging
I saw a drop from ~140ms to ~1ms (100x). - On
gitlab.com
I saw a drop from ~800ms to ~5ms (160x).
Before adding indexes (on staging)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=169425.06..169425.08 rows=5 width=1392) (actual time=128.389..128.395 rows=20 loops=1)
Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_builds.id
Sort Method: quicksort Memory: 46kB
-> Hash Left Join (cost=9813.71..169425.01 rows=5 width=1392) (actual time=127.774..128.346 rows=20 loops=1)
Hash Cond: (ci_builds.gl_project_id = project_builds.gl_project_id)
-> Nested Loop Left Join (cost=1.42..159612.68 rows=5 width=1384) (actual time=0.290..0.855 rows=20 loops=1)
Filter: ((project_features.builds_access_level IS NULL) OR (project_features.builds_access_level > 0))
-> Nested Loop (cost=0.99..159568.24 rows=6 width=1384) (actual time=0.280..0.746 rows=20 loops=1)
-> Index Scan using index_ci_builds_on_type_status on ci_builds (cost=0.56..57320.34 rows=4241 width=1384) (actual time=0.027..0.146 rows=133 loops=1)
Index Cond: (((type)::text = 'Ci::Build'::text) AND ((status)::text = 'pending'::text))
Filter: (runner_id IS NULL)
-> Index Scan using projects_pkey on projects (cost=0.43..24.10 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=133)
Index Cond: (id = ci_builds.gl_project_id)
Filter: (pending_delete AND shared_runners_enabled AND ((visibility_level = 20) OR ((SubPlan 1) = 1)))
Rows Removed by Filter: 0
SubPlan 1
-> Nested Loop Left Join (cost=0.58..16.63 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=20)
Join Filter: (namespace_statistics.namespace_id = namespaces.id)
Filter: ((COALESCE(namespaces.shared_runners_minutes_limit, 0) = 0) OR (COALESCE(namespace_statistics.shared_runners_seconds, 0) < (COALESCE(namespaces.shared_runners_minutes_limit, 0) * 60)))
-> Index Scan using namespaces_pkey on namespaces (cost=0.43..8.45 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=20)
Index Cond: (id = projects.namespace_id)
Filter: (deleted_at IS NULL)
-> Index Scan using index_namespace_statistics_on_namespace_id on namespace_statistics (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=15)
Index Cond: (namespace_id = projects.namespace_id)
-> Index Scan using index_project_features_on_project_id on project_features (cost=0.43..7.39 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=20)
Index Cond: (ci_builds.gl_project_id = project_id)
-> Hash (cost=9812.05..9812.05 rows=20 width=12) (actual time=127.476..127.476 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Subquery Scan on project_builds (cost=9811.50..9812.05 rows=20 width=12) (actual time=127.475..127.475 rows=0 loops=1)
-> GroupAggregate (cost=9811.50..9811.85 rows=20 width=12) (actual time=127.475..127.475 rows=0 loops=1)
Group Key: ci_builds_1.gl_project_id
-> Sort (cost=9811.50..9811.55 rows=20 width=4) (actual time=127.473..127.473 rows=0 loops=1)
Sort Key: ci_builds_1.gl_project_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=561.91..9811.06 rows=20 width=4) (actual time=127.470..127.470 rows=0 loops=1)
-> Seq Scan on ci_runners (cost=0.00..7483.04 rows=4 width=4) (actual time=1.263..7.666 rows=4 loops=1)
Filter: is_shared
Rows Removed by Filter: 14100
-> Bitmap Heap Scan on ci_builds ci_builds_1 (cost=561.91..581.96 rows=5 width=8) (actual time=29.944..29.944 rows=0 loops=4)
Recheck Cond: ((runner_id = ci_runners.id) AND ((status)::text = 'running'::text))
Filter: ((type)::text = 'Ci::Build'::text)
-> BitmapAnd (cost=561.91..561.91 rows=5 width=0) (actual time=29.878..29.878 rows=0 loops=4)
-> Bitmap Index Scan on index_ci_builds_on_runner_id (cost=0.00..76.24 rows=2107 width=0) (actual time=17.926..17.926 rows=133862 loops=4)
Index Cond: (runner_id = ci_runners.id)
-> Bitmap Index Scan on index_ci_builds_on_status (cost=0.00..480.37 rows=21324 width=0) (actual time=4.113..4.113 rows=21768 loops=4)
Index Cond: ((status)::text = 'running'::text)
Planning time: 0.973 ms
Execution time: 128.534 ms
(48 rows)
After adding indexes (on staging)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=116869.00..116869.02 rows=5 width=1392) (actual time=0.895..0.900 rows=20 loops=1)
Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_builds.id
Sort Method: quicksort Memory: 46kB
-> Hash Left Join (cost=101.70..116868.95 rows=5 width=1392) (actual time=0.322..0.849 rows=20 loops=1)
Hash Cond: (ci_builds.gl_project_id = project_builds.gl_project_id)
-> Nested Loop Left Join (cost=1.42..116768.64 rows=5 width=1384) (actual time=0.265..0.781 rows=20 loops=1)
Filter: ((project_features.builds_access_level IS NULL) OR (project_features.builds_access_level > 0))
-> Nested Loop (cost=0.99..116724.19 rows=6 width=1384) (actual time=0.255..0.690 rows=20 loops=1)
-> Index Scan using index_ci_builds_on_status_and_type_and_runner_id on ci_builds (cost=0.56..14476.30 rows=4241 width=1384) (actual time=0.037..0.186 rows=133 loops=1)
Index Cond: (((status)::text = 'pending'::text) AND ((type)::text = 'Ci::Build'::text) AND (runner_id IS NULL))
-> Index Scan using projects_pkey on projects (cost=0.43..24.10 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=133)
Index Cond: (id = ci_builds.gl_project_id)
Filter: (pending_delete AND shared_runners_enabled AND ((visibility_level = 20) OR ((SubPlan 1) = 1)))
Rows Removed by Filter: 0
SubPlan 1
-> Nested Loop Left Join (cost=0.58..16.63 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=20)
Join Filter: (namespace_statistics.namespace_id = namespaces.id)
Filter: ((COALESCE(namespaces.shared_runners_minutes_limit, 0) = 0) OR (COALESCE(namespace_statistics.shared_runners_seconds, 0) < (COALESCE(namespaces.shared_runners_minutes_limit, 0) * 60)))
-> Index Scan using namespaces_pkey on namespaces (cost=0.43..8.45 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=20)
Index Cond: (id = projects.namespace_id)
Filter: (deleted_at IS NULL)
-> Index Scan using index_namespace_statistics_on_namespace_id on namespace_statistics (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=15)
Index Cond: (namespace_id = projects.namespace_id)
-> Index Scan using index_project_features_on_project_id on project_features (cost=0.43..7.39 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=20)
Index Cond: (ci_builds.gl_project_id = project_id)
-> Hash (cost=100.03..100.03 rows=20 width=12) (actual time=0.046..0.046 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Subquery Scan on project_builds (cost=99.48..100.03 rows=20 width=12) (actual time=0.044..0.044 rows=0 loops=1)
-> GroupAggregate (cost=99.48..99.83 rows=20 width=12) (actual time=0.044..0.044 rows=0 loops=1)
Group Key: ci_builds_1.gl_project_id
-> Sort (cost=99.48..99.53 rows=20 width=4) (actual time=0.043..0.043 rows=0 loops=1)
Sort Key: ci_builds_1.gl_project_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.85..99.05 rows=20 width=4) (actual time=0.037..0.037 rows=0 loops=1)
-> Index Scan using index_ci_runners_on_is_shared on ci_runners (cost=0.29..10.36 rows=4 width=4) (actual time=0.008..0.015 rows=4 loops=1)
Index Cond: (is_shared = true)
Filter: is_shared
-> Index Scan using index_ci_builds_on_status_and_type_and_runner_id on ci_builds ci_builds_1 (cost=0.56..22.12 rows=5 width=8) (actual time=0.005..0.005 rows=0 loops=4)
Index Cond: (((status)::text = 'running'::text) AND ((type)::text = 'Ci::Build'::text) AND (runner_id = ci_runners.id))
Planning time: 1.124 ms
Execution time: 1.123 ms
(41 rows)
Details
This is how the query is being constructed: https://gitlab.com/gitlab-org/gitlab-ee/blob/master/app/services/ci/register_build_service.rb#L41. The culprit here seems to be: Ci::Build.pending.unstarted
which fails to run on an index. All other subqueries and joins are optimized for that condition, that we have only a few builds that are currently pending. So all other subqueries and joins are very fast if we have a filtered data set. Without the index we fail to fulfill that assumption.
This is the query that is being executed (includes a filter for build minutes):
SELECT "ci_builds".* FROM "ci_builds"
INNER JOIN "projects" ON "projects"."id" = "ci_builds"."gl_project_id" AND "projects"."pending_delete" = true
LEFT JOIN project_features ON ci_builds.gl_project_id = project_features.project_id
LEFT JOIN (
SELECT "ci_builds"."gl_project_id", count(*) AS running_builds FROM "ci_builds"
WHERE "ci_builds"."type" IN ('Ci::Build')
AND "ci_builds"."status" = 'running' AND "ci_builds"."runner_id" IN (
SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."is_shared" = 't')
GROUP BY "ci_builds"."gl_project_id"
) AS project_builds ON ci_builds.gl_project_id=project_builds.gl_project_id
WHERE "ci_builds"."type" IN ('Ci::Build') AND "ci_builds"."status" = 'pending'
AND "ci_builds"."runner_id" IS NULL AND "projects"."shared_runners_enabled" = true
AND (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0)
AND (projects.visibility_level=20 OR (
SELECT 1 FROM "namespaces"
LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id
WHERE "namespaces"."deleted_at" IS NULL AND (namespaces.id = projects.namespace_id)
AND (COALESCE(namespaces.shared_runners_minutes_limit, 0, 0) = 0
OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE(namespaces.shared_runners_minutes_limit, 0, 0) * 60))=1
)
ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;