The top query of sequential i/o is:
SELECT "labels".* FROM "labels" WHERE "labels"."template" = 't' ORDER BY "labels"."title" ASC
We have zero rows in gitlab.com where template
but even on a site which uses templates heavily this will be a small portion of the labels table so this query is doing a lot of i/o for nothing. This MR creates a partial index WHERE template
which will be empty or tiny and make this query go from being a top i/o consumer to being nearly free.
== 20170927122209 AddPartialIndexForLabelsTemplate: migrating =================
-- transaction_open?()
-> 0.0000s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- add_index("labels", ["template"], {:where=>"template", :algorithm=>:concurrently})
-> 0.0241s
== 20170927122209 AddPartialIndexForLabelsTemplate: migrated (0.0249s) ========
In staging:
gitlabhq_production=# explain analyze SELECT "labels".* FROM "labels" WHERE "labels"."template" = 't' ORDER BY "labels"."title" ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Sort (cost=76263.82..76263.82 rows=1 width=127) (actual time=271.830..271.830 rows=0 loops=1)
Sort Key: title
Sort Method: quicksort Memory: 25kB
-> Seq Scan on labels (cost=0.00..76263.81 rows=1 width=127) (actual time=271.799..271.799 rows=0 loops=1)
Filter: template
Rows Removed by Filter: 1772576
Planning time: 0.094 ms
Execution time: 271.879 ms
(8 rows)
gitlabhq_production=# begin;
BEGIN
gitlabhq_production=# create index tmp on labels (template) where (template);
CREATE INDEX
gitlabhq_production=# explain analyze SELECT "labels".* FROM "labels" WHERE "labels"."template" = 't' ORDER BY "labels"."title" ASC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Sort (cost=2.15..2.15 rows=1 width=127) (actual time=0.011..0.011 rows=0 loops=1)
Sort Key: title
Sort Method: quicksort Memory: 25kB
-> Index Scan using tmp on labels (cost=0.12..2.14 rows=1 width=127) (actual time=0.002..0.002 rows=0 loops=1)
Planning time: 0.361 ms
Execution time: 0.041 ms
(6 rows)
gitlabhq_production=# abort;
ROLLBACK
Database Checklist
When adding migrations:
-
Updated db/schema.rb
-
Added a down
method so the migration can be reverted -
Added the output of the migration(s) to the MR body -
Added the execution time of the migration(s) to the MR body -
Added tests for the migration in spec/migrations
if necessary (e.g. when migrating data) -
Made sure the migration won't interfere with a running GitLab cluster, for example by disabling transactions for long running migrations
When adding or modifying queries to improve performance:
-
Included the raw SQL queries of the relevant queries -
Included the output of EXPLAIN ANALYZE
and execution timings of the relevant queries -
Added tests for the relevant changes
When adding indexes:
-
Described the need for these indexes in the MR body -
Made sure existing indexes can not be reused instead
When adding foreign keys to existing tables:
-
Included a migration to remove orphaned rows in the source table -
Removed any instances of dependent: ...
that may no longer be necessary
When adding tables:
-
Ordered columns based on their type sizes in descending order -
Added foreign keys if necessary -
Added indexes if necessary
When removing columns, tables, indexes or other structures:
-
Removed these in a post-deployment migration -
Made sure the application no longer uses (or ignores) these structures
General Checklist
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together