Add mirror related indexes

Open username-removed-117638 requested to merge add-mirror-related-indexes into master

These indexes will be needed because of the two SLAs that were set in https://gitlab.com/gitlab-org/gitlab-ee/issues/3641#note_42534211 which can be seen in more detail here: https://gitlab.com/gitlab-cookbooks/gitlab-prometheus/merge_requests/319 and here: https://gitlab.com/gitlab-cookbooks/gitlab-prometheus/merge_requests/320

Before on staging:

                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=87173.87..87173.88 rows=1 width=8)
   ->  Nested Loop  (cost=0.43..87173.86 rows=1 width=0)
         ->  Seq Scan on project_mirror_data  (cost=0.00..1594.16 rows=23895 width=4)
               Filter: (next_execution_timestamp <= (now() - '00:10:00'::interval))
         ->  Index Scan using projects_pkey on projects  (cost=0.43..3.57 rows=1 width=4)
               Index Cond: (id = project_mirror_data.project_id)
               Filter: (mirror AND ((import_status)::text <> ALL ('{scheduled,started}'::text[])))

After on staging:

                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=7.34..7.35 rows=1 width=8)
   ->  Nested Loop  (cost=0.41..7.34 rows=1 width=0)
         ->  Index Scan using stark_tmp4 on projects  (cost=0.12..3.02 rows=1 width=4)
               Filter: ((import_status)::text <> ALL ('{scheduled,started}'::text[]))
         ->  Index Scan using index_project_mirror_data_on_project_id on project_mirror_data  (cost=0.29..4.31 rows=1 width=4)
               Index Cond: (project_id = projects.id)
               Filter: (next_execution_timestamp <= (now() - '00:10:00'::interval))

One of these may also address one of the heavy hitter queries from production:

SELECT COUNT(*) FROM "projects" INNER JOIN "project_mirror_data" ON "project_mirror_data"."project_id" = "projects"."id" WHERE "projects"."mirror" = ? AND ("projects"."import_status" NOT IN (?,?)) AND (next_execution_timestamp <= ?);

However testing this on staging has proven fruitless as all mirroring is turned off on staging so all the next_execution_timestamp fields are set to old values.

== 20171006114203 AddIndexToProjectMirrorDataNextExecutionTimestamp: migrating 
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:project_mirror_data, :next_execution_timestamp, {:algorithm=>:concurrently})
   -> 0.0438s
== 20171006114203 AddIndexToProjectMirrorDataNextExecutionTimestamp: migrated (0.0445s) 

== 20171006121407 AddIndexToProjectMirrorLastUpdateAt: migrating ==============
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- add_index(:projects, [:mirror, :mirror_last_update_at], {:where=>"mirror", :algorithm=>:concurrently})
   -> 0.0127s
== 20171006121407 AddIndexToProjectMirrorLastUpdateAt: migrated (0.0132s) =====

Remove this section and replace it with a description of your MR. Also follow the checklist below and check off any tasks that are done. If a certain task can not be done you should explain so in the MR body. You are free to remove any sections that do not apply to your MR.

When gathering statistics (e.g. the output of EXPLAIN ANALYZE) you should make sure your database has enough data. Having around 10 000 rows in the tables being queries should provide a reasonable estimate of how a query will behave. Also make sure that PostgreSQL uses the following settings:

  • random_page_cost: 1
  • work_mem: 16MB
  • maintenance_work_mem: at least 64MB
  • shared_buffers: at least 256MB

If you have access to GitLab.com's staging environment you should also run your measurements there, and include the results in this MR.

Database Checklist

When adding migrations:

  • Updated db/schema.rb
  • Added a down method so the migration can be reverted
  • Added the output of the migration(s) to the MR body
  • Added the execution time of the migration(s) to the MR body
  • Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)
  • Made sure the migration won't interfere with a running GitLab cluster, for example by disabling transactions for long running migrations

When adding or modifying queries to improve performance:

  • Included the raw SQL queries of the relevant queries
  • Included the output of EXPLAIN ANALYZE and execution timings of the relevant queries
  • Added tests for the relevant changes

When adding indexes:

  • Described the need for these indexes in the MR body
  • Made sure existing indexes can not be reused instead

When adding foreign keys to existing tables:

  • Included a migration to remove orphaned rows in the source table
  • Removed any instances of dependent: ... that may no longer be necessary

When adding tables:

  • Ordered columns based on their type sizes in descending order
  • Added foreign keys if necessary
  • Added indexes if necessary

When removing columns, tables, indexes or other structures:

  • Removed these in a post-deployment migration
  • Made sure the application no longer uses (or ignores) these structures

General Checklist

Edited by Gregory Stark