Skip to content

Add push_rules is_sample partial index

Gregory Stark requested to merge add-pushrules-issample-partial-index2 into master

Adds a an index on push_rules.is_sample where is_sample to address https://gitlab.com/gitlab-org/gitlab-ee/issues/3425. This index is already present in production.

This index is needed to avoid sequential scans on one of the top consumers of I/O in the database. This is similar to the labels.template case where we have a few (or in the case of gitlab.com, zero) special "template" or "sample" records in an otherwise large table and are doing a huge sequential scan to pick out the few template rows. I would prefer to store these in a separate table for various reasons but as described in the issue that isn't happening soon so to stop the pain create an index today.

== 20170920091408 AddIndexForPushrulesIsSample: migrating =====================
-- index_exists?(:push_rules, :is_sample)
   -> 0.0016s
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0002s
-- add_index(:push_rules, :is_sample, {:where=>"is_sample", :algorithm=>:concurrently})
   -> 0.0593s
== 20170920091408 AddIndexForPushrulesIsSample: migrated (0.0613s) ============

On staging (the index was already created in production):

gitlabhq_production=# explain analyze SELECT  "push_rules".* FROM "push_rules" WHERE "push_rules"."is_sample" = 't' LIMIT 1;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..49222.89 rows=1 width=193) (actual time=20.464..20.465 rows=1 loops=1)
   ->  Seq Scan on push_rules  (cost=0.00..49222.89 rows=1 width=193) (actual time=20.461..20.461 rows=1 loops=1)
         Filter: is_sample
         Rows Removed by Filter: 173287
 Planning time: 0.055 ms
 Execution time: 20.488 ms
(6 rows)

gitlabhq_production=# begin;
BEGIN

gitlabhq_production=# create index push_rules_on_is_sample on push_rules (is_sample) where is_sample;
CREATE INDEX

gitlabhq_production=# explain analyze SELECT  "push_rules".* FROM "push_rules" WHERE "push_rules"."is_sample" = 't' LIMIT 1;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.12..2.14 rows=1 width=193) (actual time=0.019..0.019 rows=1 loops=1)
   ->  Index Scan using push_rules_on_is_sample on push_rules  (cost=0.12..2.14 rows=1 width=193) (actual time=0.017..0.017 rows=1 loops=1)
 Planning time: 0.156 ms
 Execution time: 0.044 ms
(4 rows)

Database Checklist

When adding migrations:

  • Updated db/schema.rb
  • Added a down method so the migration can be reverted
  • Added the output of the migration(s) to the MR body
  • Added the execution time of the migration(s) to the MR body
  • Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)
  • Made sure the migration won't interfere with a running GitLab cluster, for example by disabling transactions for long running migrations

When adding indexes:

  • Described the need for these indexes in the MR body
  • Made sure existing indexes can not be reused instead

When removing columns, tables, indexes or other structures:

  • Removed these in a post-deployment migration
  • Made sure the application no longer uses (or ignores) these structures

General Checklist

Edited by yorickpeterse-staging

Merge request reports