Trending projects query performs sequence scans
The controller method Explore::ProjectsController#trending
uses the following query for counting statistics (as per New Relic):
SELECT COUNT(*) AS count_all, projects.id AS projects_id
FROM ?
INNER JOIN ? ON ?.? = ?.?
WHERE ?.? IN (?, ?)
AND (notes.created_at > ?)
AND ?.? = ?
GROUP BY projects.id
ORDER BY count(notes.id) DESC
LIMIT ?
OFFSET ?
Replacing the placeholders with actual values can yield a query such as the following:
SELECT COUNT(*) AS count_all, projects.id AS projects_id
FROM "projects"
INNER JOIN "notes" ON "notes"."project_id" = "projects"."id"
WHERE "projects"."visibility_level" IN (0, 20)
AND (notes.created_at > '2014-01-01')
AND "projects"."archived" is false
GROUP BY projects.id
ORDER BY count(notes.id) DESC
LIMIT 100
OFFSET 0;
If we run this through EXPLAIN ANALYZE
we get the following query plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=203.89..203.90 rows=6 width=8) (actual time=3.821..3.826 rows=6 loops=1)
-> Sort (cost=203.89..203.90 rows=6 width=8) (actual time=3.818..3.821 rows=6 loops=1)
Sort Key: (count(notes.id))
Sort Method: quicksort Memory: 25kB
-> GroupAggregate (cost=33.47..203.81 rows=6 width=8) (actual time=0.097..3.796 rows=6 loops=1)
Group Key: projects.id
-> Merge Join (cost=33.47..184.20 rows=2607 width=8) (actual time=0.037..3.332 rows=3506 loops=1)
Merge Cond: (notes.project_id = projects.id)
-> Index Scan using index_notes_on_project_id on notes (cost=0.28..396.88 rows=3910 width=8) (actual time=0.007..2.283 rows=3910 loops=1)
Filter: (created_at > '2014-01-01 00:00:00'::timestamp without time zone)
-> Sort (cost=33.19..33.21 rows=6 width=4) (actual time=0.028..0.200 rows=671 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on projects (cost=0.00..33.11 rows=6 width=4) (actual time=0.004..0.020 rows=6 loops=1)
Filter: ((archived IS FALSE) AND (visibility_level = ANY ('{0,20}'::integer[])))
Rows Removed by Filter: 3
Planning time: 0.271 ms
Execution time: 3.886 ms
(18 rows)
This query performs a regular sequence scan on the "projects" table due to a lack of indexes on the following columns:
- projects.visibility_level
- projects.archived
New Relic in turn shows numerous transactions where running this query took 3-5 seconds, for example:
Adding indexes on these columns will remove the need for full sequence scans, in turn allowing PostgreSQL to perform bitmap heap scans. These aren't as fast as full index scans, but they can be way faster than regular sequence scans.