Skip to content

Add index for ci_builds that makes shared runners query to run 100x faster.

Kamil Trzcińśki requested to merge add-index-on-ci-builds into master

What does this MR do?

This reduces the time for run SQL query for Shared Runners by over 100x.

We reduced the number of times that this query is executed (by checking a Redis value), but it did not change the performance of it when it needs to be.

By adding a simple index:

  • On staging I saw a drop from ~140ms to ~1ms (100x).
  • On gitlab.com I saw a drop from ~800ms to ~5ms (160x).

Before adding indexes (on staging)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=169425.06..169425.08 rows=5 width=1392) (actual time=128.389..128.395 rows=20 loops=1)
   Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_builds.id
   Sort Method: quicksort  Memory: 46kB
   ->  Hash Left Join  (cost=9813.71..169425.01 rows=5 width=1392) (actual time=127.774..128.346 rows=20 loops=1)
         Hash Cond: (ci_builds.gl_project_id = project_builds.gl_project_id)
         ->  Nested Loop Left Join  (cost=1.42..159612.68 rows=5 width=1384) (actual time=0.290..0.855 rows=20 loops=1)
               Filter: ((project_features.builds_access_level IS NULL) OR (project_features.builds_access_level > 0))
               ->  Nested Loop  (cost=0.99..159568.24 rows=6 width=1384) (actual time=0.280..0.746 rows=20 loops=1)
                     ->  Index Scan using index_ci_builds_on_type_status on ci_builds  (cost=0.56..57320.34 rows=4241 width=1384) (actual time=0.027..0.146 rows=133 loops=1)
                           Index Cond: (((type)::text = 'Ci::Build'::text) AND ((status)::text = 'pending'::text))
                           Filter: (runner_id IS NULL)
                     ->  Index Scan using projects_pkey on projects  (cost=0.43..24.10 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=133)
                           Index Cond: (id = ci_builds.gl_project_id)
                           Filter: (pending_delete AND shared_runners_enabled AND ((visibility_level = 20) OR ((SubPlan 1) = 1)))
                           Rows Removed by Filter: 0
                           SubPlan 1
                             ->  Nested Loop Left Join  (cost=0.58..16.63 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=20)
                                   Join Filter: (namespace_statistics.namespace_id = namespaces.id)
                                   Filter: ((COALESCE(namespaces.shared_runners_minutes_limit, 0) = 0) OR (COALESCE(namespace_statistics.shared_runners_seconds, 0) < (COALESCE(namespaces.shared_runners_minutes_limit, 0) * 60)))
                                   ->  Index Scan using namespaces_pkey on namespaces  (cost=0.43..8.45 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=20)
                                         Index Cond: (id = projects.namespace_id)
                                         Filter: (deleted_at IS NULL)
                                   ->  Index Scan using index_namespace_statistics_on_namespace_id on namespace_statistics  (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=15)
                                         Index Cond: (namespace_id = projects.namespace_id)
               ->  Index Scan using index_project_features_on_project_id on project_features  (cost=0.43..7.39 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=20)
                     Index Cond: (ci_builds.gl_project_id = project_id)
         ->  Hash  (cost=9812.05..9812.05 rows=20 width=12) (actual time=127.476..127.476 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Subquery Scan on project_builds  (cost=9811.50..9812.05 rows=20 width=12) (actual time=127.475..127.475 rows=0 loops=1)
                     ->  GroupAggregate  (cost=9811.50..9811.85 rows=20 width=12) (actual time=127.475..127.475 rows=0 loops=1)
                           Group Key: ci_builds_1.gl_project_id
                           ->  Sort  (cost=9811.50..9811.55 rows=20 width=4) (actual time=127.473..127.473 rows=0 loops=1)
                                 Sort Key: ci_builds_1.gl_project_id
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Nested Loop  (cost=561.91..9811.06 rows=20 width=4) (actual time=127.470..127.470 rows=0 loops=1)
                                       ->  Seq Scan on ci_runners  (cost=0.00..7483.04 rows=4 width=4) (actual time=1.263..7.666 rows=4 loops=1)
                                             Filter: is_shared
                                             Rows Removed by Filter: 14100
                                       ->  Bitmap Heap Scan on ci_builds ci_builds_1  (cost=561.91..581.96 rows=5 width=8) (actual time=29.944..29.944 rows=0 loops=4)
                                             Recheck Cond: ((runner_id = ci_runners.id) AND ((status)::text = 'running'::text))
                                             Filter: ((type)::text = 'Ci::Build'::text)
                                             ->  BitmapAnd  (cost=561.91..561.91 rows=5 width=0) (actual time=29.878..29.878 rows=0 loops=4)
                                                   ->  Bitmap Index Scan on index_ci_builds_on_runner_id  (cost=0.00..76.24 rows=2107 width=0) (actual time=17.926..17.926 rows=133862 loops=4)
                                                         Index Cond: (runner_id = ci_runners.id)
                                                   ->  Bitmap Index Scan on index_ci_builds_on_status  (cost=0.00..480.37 rows=21324 width=0) (actual time=4.113..4.113 rows=21768 loops=4)
                                                         Index Cond: ((status)::text = 'running'::text)
 Planning time: 0.973 ms
 Execution time: 128.534 ms
(48 rows)

After adding indexes (on staging)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=116869.00..116869.02 rows=5 width=1392) (actual time=0.895..0.900 rows=20 loops=1)
   Sort Key: (COALESCE(project_builds.running_builds, '0'::bigint)), ci_builds.id
   Sort Method: quicksort  Memory: 46kB
   ->  Hash Left Join  (cost=101.70..116868.95 rows=5 width=1392) (actual time=0.322..0.849 rows=20 loops=1)
         Hash Cond: (ci_builds.gl_project_id = project_builds.gl_project_id)
         ->  Nested Loop Left Join  (cost=1.42..116768.64 rows=5 width=1384) (actual time=0.265..0.781 rows=20 loops=1)
               Filter: ((project_features.builds_access_level IS NULL) OR (project_features.builds_access_level > 0))
               ->  Nested Loop  (cost=0.99..116724.19 rows=6 width=1384) (actual time=0.255..0.690 rows=20 loops=1)
                     ->  Index Scan using index_ci_builds_on_status_and_type_and_runner_id on ci_builds  (cost=0.56..14476.30 rows=4241 width=1384) (actual time=0.037..0.186 rows=133 loops=1)
                           Index Cond: (((status)::text = 'pending'::text) AND ((type)::text = 'Ci::Build'::text) AND (runner_id IS NULL))
                     ->  Index Scan using projects_pkey on projects  (cost=0.43..24.10 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=133)
                           Index Cond: (id = ci_builds.gl_project_id)
                           Filter: (pending_delete AND shared_runners_enabled AND ((visibility_level = 20) OR ((SubPlan 1) = 1)))
                           Rows Removed by Filter: 0
                           SubPlan 1
                             ->  Nested Loop Left Join  (cost=0.58..16.63 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=20)
                                   Join Filter: (namespace_statistics.namespace_id = namespaces.id)
                                   Filter: ((COALESCE(namespaces.shared_runners_minutes_limit, 0) = 0) OR (COALESCE(namespace_statistics.shared_runners_seconds, 0) < (COALESCE(namespaces.shared_runners_minutes_limit, 0) * 60)))
                                   ->  Index Scan using namespaces_pkey on namespaces  (cost=0.43..8.45 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=20)
                                         Index Cond: (id = projects.namespace_id)
                                         Filter: (deleted_at IS NULL)
                                   ->  Index Scan using index_namespace_statistics_on_namespace_id on namespace_statistics  (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=15)
                                         Index Cond: (namespace_id = projects.namespace_id)
               ->  Index Scan using index_project_features_on_project_id on project_features  (cost=0.43..7.39 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=20)
                     Index Cond: (ci_builds.gl_project_id = project_id)
         ->  Hash  (cost=100.03..100.03 rows=20 width=12) (actual time=0.046..0.046 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Subquery Scan on project_builds  (cost=99.48..100.03 rows=20 width=12) (actual time=0.044..0.044 rows=0 loops=1)
                     ->  GroupAggregate  (cost=99.48..99.83 rows=20 width=12) (actual time=0.044..0.044 rows=0 loops=1)
                           Group Key: ci_builds_1.gl_project_id
                           ->  Sort  (cost=99.48..99.53 rows=20 width=4) (actual time=0.043..0.043 rows=0 loops=1)
                                 Sort Key: ci_builds_1.gl_project_id
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Nested Loop  (cost=0.85..99.05 rows=20 width=4) (actual time=0.037..0.037 rows=0 loops=1)
                                       ->  Index Scan using index_ci_runners_on_is_shared on ci_runners  (cost=0.29..10.36 rows=4 width=4) (actual time=0.008..0.015 rows=4 loops=1)
                                             Index Cond: (is_shared = true)
                                             Filter: is_shared
                                       ->  Index Scan using index_ci_builds_on_status_and_type_and_runner_id on ci_builds ci_builds_1  (cost=0.56..22.12 rows=5 width=8) (actual time=0.005..0.005 rows=0 loops=4)
                                             Index Cond: (((status)::text = 'running'::text) AND ((type)::text = 'Ci::Build'::text) AND (runner_id = ci_runners.id))
 Planning time: 1.124 ms
 Execution time: 1.123 ms
(41 rows)

Details

This is how the query is being constructed: https://gitlab.com/gitlab-org/gitlab-ee/blob/master/app/services/ci/register_build_service.rb#L41. The culprit here seems to be: Ci::Build.pending.unstarted which fails to run on an index. All other subqueries and joins are optimized for that condition, that we have only a few builds that are currently pending. So all other subqueries and joins are very fast if we have a filtered data set. Without the index we fail to fulfill that assumption.

This is the query that is being executed (includes a filter for build minutes):

SELECT "ci_builds".* FROM "ci_builds" 
	INNER JOIN "projects" ON "projects"."id" = "ci_builds"."gl_project_id" AND "projects"."pending_delete" = true 
	LEFT JOIN project_features ON ci_builds.gl_project_id = project_features.project_id 
	LEFT JOIN (
		SELECT "ci_builds"."gl_project_id", count(*) AS running_builds FROM "ci_builds"
		WHERE "ci_builds"."type" IN ('Ci::Build')
		AND "ci_builds"."status" = 'running' AND "ci_builds"."runner_id" IN (
			SELECT "ci_runners"."id" FROM "ci_runners" WHERE "ci_runners"."is_shared" = 't')
		GROUP BY "ci_builds"."gl_project_id"
	) AS project_builds ON ci_builds.gl_project_id=project_builds.gl_project_id
	WHERE "ci_builds"."type" IN ('Ci::Build') AND "ci_builds"."status" = 'pending'
		AND "ci_builds"."runner_id" IS NULL AND "projects"."shared_runners_enabled" = true
		AND (project_features.builds_access_level IS NULL or project_features.builds_access_level > 0)
		AND (projects.visibility_level=20 OR (
			SELECT 1 FROM "namespaces"
			LEFT JOIN namespace_statistics ON namespace_statistics.namespace_id = namespaces.id
			WHERE "namespaces"."deleted_at" IS NULL AND (namespaces.id = projects.namespace_id)
			AND (COALESCE(namespaces.shared_runners_minutes_limit, 0, 0) = 0
				OR COALESCE(namespace_statistics.shared_runners_seconds, 0) < COALESCE(namespaces.shared_runners_minutes_limit, 0, 0) * 60))=1
		)
	ORDER BY COALESCE(project_builds.running_builds, 0) ASC, ci_builds.id ASC;

cc @stanhu @yorickpeterse

Merge request reports