SQL query used for getting issues/MRs by labels is overly complex

When filtering MR or issues by labels and a state you'll end up with a query along the lines of the following:

SELECT merge_requests.*
FROM merge_requests
INNER JOIN label_links ON label_links.target_id = merge_requests.id AND label_links.target_type = 'MergeRequest'
INNER JOIN labels ON labels.id = label_links.label_id
WHERE merge_requests.deleted_at IS NULL
AND merge_requests.state IN ('opened')
AND labels.title = 'bug'
AND labels.id IN (
    SELECT labels.id
    FROM labels
    WHERE (
        labels.id IN (
            SELECT labels.id
            FROM labels
            WHERE labels.group_id IN (
                SELECT projects.namespace_id
                FROM projects
                INNER JOIN namespaces ON namespaces.id = projects.namespace_id
                WHERE projects.id = 13083
                AND namespaces.type = 'Group'
            )

            UNION

            SELECT labels.id
            FROM labels
            WHERE labels.project_id IN (
                SELECT projects.id
                FROM projects
                WHERE projects.id = 13083
            )
        )
    )
    ORDER BY labels.title ASC
)
AND merge_requests.target_project_id = 13083
ORDER BY merge_requests.id DESC;

Running a query such as this will produce plan:

                                                                                                         QUERY PLAN                                                                                                         
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=197763.46..197763.48 rows=6 width=614) (actual time=39.005..39.014 rows=18 loops=1)
   Sort Key: merge_requests.id DESC
   Sort Method: quicksort  Memory: 55kB
   ->  Hash Semi Join  (cost=148815.25..197763.39 rows=6 width=614) (actual time=8.539..38.921 rows=18 loops=1)
         Hash Cond: (labels.id = "ANY_subquery".id)
         ->  Nested Loop  (cost=454.45..45333.49 rows=11 width=622) (actual time=5.415..35.437 rows=18 loops=1)
               ->  Nested Loop  (cost=454.02..45259.83 rows=97 width=618) (actual time=5.038..30.897 rows=1262 loops=1)
                     ->  Bitmap Heap Scan on merge_requests  (cost=453.59..40207.03 rows=1152 width=614) (actual time=5.019..24.644 rows=521 loops=1)
                           Recheck Cond: (target_project_id = 13083)
                           Filter: ((deleted_at IS NULL) AND ((state)::text = 'opened'::text))
                           Rows Removed by Filter: 13348
                           Heap Blocks: exact=11772
                           ->  Bitmap Index Scan on index_merge_requests_on_target_project_id_and_iid  (cost=0.00..453.30 rows=22782 width=0) (actual time=2.358..2.358 rows=13997 loops=1)
                                 Index Cond: (target_project_id = 13083)
                     ->  Index Scan using index_label_links_on_target_id_and_target_type on label_links  (cost=0.43..4.38 rows=1 width=8) (actual time=0.007..0.009 rows=2 loops=521)
                           Index Cond: ((target_id = merge_requests.id) AND ((target_type)::text = 'MergeRequest'::text))
               ->  Index Scan using labels_pkey on labels  (cost=0.43..0.75 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1262)
                     Index Cond: (id = label_links.label_id)
                     Filter: ((title)::text = 'bug'::text)
                     Rows Removed by Filter: 1
         ->  Hash  (cost=131279.47..131279.47 rows=1041147 width=4) (actual time=2.389..2.389 rows=199 loops=1)
               Buckets: 524288  Batches: 4  Memory Usage: 4098kB
               ->  Subquery Scan on "ANY_subquery"  (cost=118265.13..131279.47 rows=1041147 width=4) (actual time=2.049..2.187 rows=199 loops=1)
                     ->  Sort  (cost=118265.13..120868.00 rows=1041147 width=13) (actual time=2.048..2.109 rows=199 loops=1)
                           Sort Key: labels_1.title
                           Sort Method: quicksort  Memory: 36kB
                           ->  Nested Loop  (cost=467.30..1493.83 rows=1041147 width=13) (actual time=0.621..1.797 rows=199 loops=1)
                                 ->  HashAggregate  (cost=466.88..469.18 rows=230 width=4) (actual time=0.614..0.687 rows=199 loops=1)
                                       Group Key: labels_2.id
                                       ->  Append  (cost=9.34..466.30 rows=230 width=4) (actual time=0.049..0.533 rows=199 loops=1)
                                             ->  Nested Loop  (cost=9.34..55.05 rows=22 width=4) (actual time=0.049..0.167 rows=56 loops=1)
                                                   ->  HashAggregate  (cost=8.91..8.92 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=1)
                                                         Group Key: projects.namespace_id
                                                         ->  Nested Loop  (cost=0.86..8.91 rows=1 width=8) (actual time=0.021..0.032 rows=1 loops=1)
                                                               ->  Index Scan using projects_pkey on projects  (cost=0.43..4.45 rows=1 width=4) (actual time=0.013..0.023 rows=1 loops=1)
                                                                     Index Cond: (id = 13083)
                                                               ->  Index Scan using namespaces_pkey on namespaces  (cost=0.43..4.45 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
                                                                     Index Cond: (id = projects.namespace_id)
                                                                     Filter: ((type)::text = 'Group'::text)
                                                   ->  Index Scan using index_labels_on_group_id_and_project_id_and_title on labels labels_2  (cost=0.43..45.91 rows=22 width=8) (actual time=0.008..0.084 rows=56 loops=1)
                                                         Index Cond: (group_id = namespaces.id)
                                             ->  Nested Loop  (cost=0.86..408.95 rows=208 width=4) (actual time=0.018..0.293 rows=143 loops=1)
                                                   ->  Index Only Scan using projects_pkey on projects projects_1  (cost=0.43..4.45 rows=1 width=4) (actual time=0.005..0.010 rows=1 loops=1)
                                                         Index Cond: (id = 13083)
                                                         Heap Fetches: 4
                                                   ->  Index Scan using index_labels_on_project_id on labels labels_3  (cost=0.43..402.43 rows=208 width=8) (actual time=0.012..0.216 rows=143 loops=1)
                                                         Index Cond: (project_id = 13083)
                                 ->  Index Scan using labels_pkey on labels labels_1  (cost=0.43..4.44 rows=1 width=13) (actual time=0.004..0.004 rows=1 loops=199)
                                       Index Cond: (id = labels_2.id)
 Planning time: 1.996 ms
 Execution time: 39.172 ms

The exact query will differ a bit depending on your access level and filters, but this is the general gist of it.

This query is way too complex and can be reduced down to the following:

EXPLAIN ANALYZE
SELECT merge_requests.*
FROM merge_requests
WHERE merge_requests.deleted_at IS NULL
AND merge_requests.state IN ('opened')
AND EXISTS (
    SELECT true
    FROM label_links
    INNER JOIN labels ON labels.id = label_links.label_id
    WHERE label_links.target_id = merge_requests.id
    AND label_links.target_type = 'MergeRequest'
    AND labels.title = 'bug'
)
AND merge_requests.target_project_id = 13083
ORDER BY merge_requests.id DESC;

This produces plan:

                                                                                   QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=46134.76..46134.78 rows=11 width=614) (actual time=37.964..37.978 rows=18 loops=1)
   Sort Key: merge_requests.id DESC
   Sort Method: quicksort  Memory: 55kB
   ->  Nested Loop Semi Join  (cost=454.45..46134.57 rows=11 width=614) (actual time=7.802..37.909 rows=18 loops=1)
         ->  Bitmap Heap Scan on merge_requests  (cost=453.59..40207.03 rows=1152 width=614) (actual time=4.904..28.120 rows=521 loops=1)
               Recheck Cond: (target_project_id = 13083)
               Filter: ((deleted_at IS NULL) AND ((state)::text = 'opened'::text))
               Rows Removed by Filter: 13348
               Heap Blocks: exact=11778
               ->  Bitmap Index Scan on index_merge_requests_on_target_project_id_and_iid  (cost=0.00..453.30 rows=22782 width=0) (actual time=2.574..2.574 rows=14003 loops=1)
                     Index Cond: (target_project_id = 13083)
         ->  Nested Loop  (cost=0.86..5.14 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=521)
               ->  Index Scan using index_label_links_on_target_id_and_target_type on label_links  (cost=0.43..4.38 rows=1 width=8) (actual time=0.007..0.009 rows=2 loops=521)
                     Index Cond: ((target_id = merge_requests.id) AND ((target_type)::text = 'MergeRequest'::text))
               ->  Index Scan using labels_pkey on labels  (cost=0.43..0.75 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1234)
                     Index Cond: (id = label_links.label_id)
                     Filter: ((title)::text = 'bug'::text)
                     Rows Removed by Filter: 1
 Planning time: 0.648 ms
 Execution time: 38.061 ms
(20 rows)

The timings are roughly similar, but the plan is much smaller.

This query can most likely be further improved by indexing (target_project_id, state) on merge_requests since state is used in a Filter above and not an index condition, and said filter happens to filter out 13 341 rows.

Long story short: the sub queries and UNION are as far as I can tell redundant.

Admin message

Admin message

SQL query used for getting issues/MRs by labels is overly complex