SQL query used for getting issues/MRs by labels is overly complex
When filtering MR or issues by labels and a state you'll end up with a query along the lines of the following:
SELECT merge_requests.*
FROM merge_requests
INNER JOIN label_links ON label_links.target_id = merge_requests.id AND label_links.target_type = 'MergeRequest'
INNER JOIN labels ON labels.id = label_links.label_id
WHERE merge_requests.deleted_at IS NULL
AND merge_requests.state IN ('opened')
AND labels.title = 'bug'
AND labels.id IN (
SELECT labels.id
FROM labels
WHERE (
labels.id IN (
SELECT labels.id
FROM labels
WHERE labels.group_id IN (
SELECT projects.namespace_id
FROM projects
INNER JOIN namespaces ON namespaces.id = projects.namespace_id
WHERE projects.id = 13083
AND namespaces.type = 'Group'
)
UNION
SELECT labels.id
FROM labels
WHERE labels.project_id IN (
SELECT projects.id
FROM projects
WHERE projects.id = 13083
)
)
)
ORDER BY labels.title ASC
)
AND merge_requests.target_project_id = 13083
ORDER BY merge_requests.id DESC;
Running a query such as this will produce plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=197763.46..197763.48 rows=6 width=614) (actual time=39.005..39.014 rows=18 loops=1)
Sort Key: merge_requests.id DESC
Sort Method: quicksort Memory: 55kB
-> Hash Semi Join (cost=148815.25..197763.39 rows=6 width=614) (actual time=8.539..38.921 rows=18 loops=1)
Hash Cond: (labels.id = "ANY_subquery".id)
-> Nested Loop (cost=454.45..45333.49 rows=11 width=622) (actual time=5.415..35.437 rows=18 loops=1)
-> Nested Loop (cost=454.02..45259.83 rows=97 width=618) (actual time=5.038..30.897 rows=1262 loops=1)
-> Bitmap Heap Scan on merge_requests (cost=453.59..40207.03 rows=1152 width=614) (actual time=5.019..24.644 rows=521 loops=1)
Recheck Cond: (target_project_id = 13083)
Filter: ((deleted_at IS NULL) AND ((state)::text = 'opened'::text))
Rows Removed by Filter: 13348
Heap Blocks: exact=11772
-> Bitmap Index Scan on index_merge_requests_on_target_project_id_and_iid (cost=0.00..453.30 rows=22782 width=0) (actual time=2.358..2.358 rows=13997 loops=1)
Index Cond: (target_project_id = 13083)
-> Index Scan using index_label_links_on_target_id_and_target_type on label_links (cost=0.43..4.38 rows=1 width=8) (actual time=0.007..0.009 rows=2 loops=521)
Index Cond: ((target_id = merge_requests.id) AND ((target_type)::text = 'MergeRequest'::text))
-> Index Scan using labels_pkey on labels (cost=0.43..0.75 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1262)
Index Cond: (id = label_links.label_id)
Filter: ((title)::text = 'bug'::text)
Rows Removed by Filter: 1
-> Hash (cost=131279.47..131279.47 rows=1041147 width=4) (actual time=2.389..2.389 rows=199 loops=1)
Buckets: 524288 Batches: 4 Memory Usage: 4098kB
-> Subquery Scan on "ANY_subquery" (cost=118265.13..131279.47 rows=1041147 width=4) (actual time=2.049..2.187 rows=199 loops=1)
-> Sort (cost=118265.13..120868.00 rows=1041147 width=13) (actual time=2.048..2.109 rows=199 loops=1)
Sort Key: labels_1.title
Sort Method: quicksort Memory: 36kB
-> Nested Loop (cost=467.30..1493.83 rows=1041147 width=13) (actual time=0.621..1.797 rows=199 loops=1)
-> HashAggregate (cost=466.88..469.18 rows=230 width=4) (actual time=0.614..0.687 rows=199 loops=1)
Group Key: labels_2.id
-> Append (cost=9.34..466.30 rows=230 width=4) (actual time=0.049..0.533 rows=199 loops=1)
-> Nested Loop (cost=9.34..55.05 rows=22 width=4) (actual time=0.049..0.167 rows=56 loops=1)
-> HashAggregate (cost=8.91..8.92 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=1)
Group Key: projects.namespace_id
-> Nested Loop (cost=0.86..8.91 rows=1 width=8) (actual time=0.021..0.032 rows=1 loops=1)
-> Index Scan using projects_pkey on projects (cost=0.43..4.45 rows=1 width=4) (actual time=0.013..0.023 rows=1 loops=1)
Index Cond: (id = 13083)
-> Index Scan using namespaces_pkey on namespaces (cost=0.43..4.45 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (id = projects.namespace_id)
Filter: ((type)::text = 'Group'::text)
-> Index Scan using index_labels_on_group_id_and_project_id_and_title on labels labels_2 (cost=0.43..45.91 rows=22 width=8) (actual time=0.008..0.084 rows=56 loops=1)
Index Cond: (group_id = namespaces.id)
-> Nested Loop (cost=0.86..408.95 rows=208 width=4) (actual time=0.018..0.293 rows=143 loops=1)
-> Index Only Scan using projects_pkey on projects projects_1 (cost=0.43..4.45 rows=1 width=4) (actual time=0.005..0.010 rows=1 loops=1)
Index Cond: (id = 13083)
Heap Fetches: 4
-> Index Scan using index_labels_on_project_id on labels labels_3 (cost=0.43..402.43 rows=208 width=8) (actual time=0.012..0.216 rows=143 loops=1)
Index Cond: (project_id = 13083)
-> Index Scan using labels_pkey on labels labels_1 (cost=0.43..4.44 rows=1 width=13) (actual time=0.004..0.004 rows=1 loops=199)
Index Cond: (id = labels_2.id)
Planning time: 1.996 ms
Execution time: 39.172 ms
The exact query will differ a bit depending on your access level and filters, but this is the general gist of it.
This query is way too complex and can be reduced down to the following:
EXPLAIN ANALYZE
SELECT merge_requests.*
FROM merge_requests
WHERE merge_requests.deleted_at IS NULL
AND merge_requests.state IN ('opened')
AND EXISTS (
SELECT true
FROM label_links
INNER JOIN labels ON labels.id = label_links.label_id
WHERE label_links.target_id = merge_requests.id
AND label_links.target_type = 'MergeRequest'
AND labels.title = 'bug'
)
AND merge_requests.target_project_id = 13083
ORDER BY merge_requests.id DESC;
This produces plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=46134.76..46134.78 rows=11 width=614) (actual time=37.964..37.978 rows=18 loops=1)
Sort Key: merge_requests.id DESC
Sort Method: quicksort Memory: 55kB
-> Nested Loop Semi Join (cost=454.45..46134.57 rows=11 width=614) (actual time=7.802..37.909 rows=18 loops=1)
-> Bitmap Heap Scan on merge_requests (cost=453.59..40207.03 rows=1152 width=614) (actual time=4.904..28.120 rows=521 loops=1)
Recheck Cond: (target_project_id = 13083)
Filter: ((deleted_at IS NULL) AND ((state)::text = 'opened'::text))
Rows Removed by Filter: 13348
Heap Blocks: exact=11778
-> Bitmap Index Scan on index_merge_requests_on_target_project_id_and_iid (cost=0.00..453.30 rows=22782 width=0) (actual time=2.574..2.574 rows=14003 loops=1)
Index Cond: (target_project_id = 13083)
-> Nested Loop (cost=0.86..5.14 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=521)
-> Index Scan using index_label_links_on_target_id_and_target_type on label_links (cost=0.43..4.38 rows=1 width=8) (actual time=0.007..0.009 rows=2 loops=521)
Index Cond: ((target_id = merge_requests.id) AND ((target_type)::text = 'MergeRequest'::text))
-> Index Scan using labels_pkey on labels (cost=0.43..0.75 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1234)
Index Cond: (id = label_links.label_id)
Filter: ((title)::text = 'bug'::text)
Rows Removed by Filter: 1
Planning time: 0.648 ms
Execution time: 38.061 ms
(20 rows)
The timings are roughly similar, but the plan is much smaller.
This query can most likely be further improved by indexing (target_project_id, state)
on merge_requests
since state
is used in a Filter above and not an index condition, and said filter happens to filter out 13 341 rows.
Long story short: the sub queries and UNION are as far as I can tell redundant.