This is probably the single worst performance regression in 9.0. Previously we would perform 0 sequence scans on routes. Since the RC2 deploy we now perform up to 1 billion per minute:
This is absolutely bananas and needs to be resolved as soon as possible (= before 9.0 stable).
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
@dzaporozhets There's a difference between "we use this a lot" and "we perform sequence scans a lot". Sequence scans mean PostgreSQL will iterate over every single row in the table to find its data, skipping any indexes. This is terrible for performance. We need to make sure that whatever queries were added use the right indexes.
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10004 adds this scope to the todos page, which didn't have it before, but it seems like that will still do a sequential scan (it will just do one instead of hundreds), so an index might be needed there.
gitlabhq_production=# create index concurrently index_routes_on_path_text_pattern_ops ON routes (path varchar_pattern_ops);CREATE INDEXgitlabhq_production=# explain analyze select * from routes where path like 'gitlab-org/%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using index_routes_on_path_text_pattern_ops on routes (cost=0.43..8.45 rows=326 width=73) (actual time=0.018..0.150 rows=81 loops=1) Index Cond: (((path)::text ~>=~ 'gitlab-org/'::text) AND ((path)::text ~<~ 'gitlab-org0'::text)) Filter: ((path)::text ~~ 'gitlab-org/%'::text) Planning time: 0.348 ms Execution time: 0.193 ms(5 rows)
That is much better.
This basically leaves us with two options:
We add the index (this has to be added on top of the existing normal path index), and deal with all the pain that is maintaining PostgreSQL specific code (e.g. setting up a new DB from scratch should include this index)
We find a way to modify the code so that it doesn't have to use a LIKE
We find a way to modify the code so that it doesn't have to use a LIKE
@yorickpeterse I believe the use of path like 'gitlab-org/%' is the only easy way to get list of descendants for group and avoid recursive lookup (considering nested groups).
Oddly enough I'm not seeing a reduction in response timings with this index in place, and a very minor reduction in SQL timings. I wonder if PostgreSQL hasn't fully optimised queries just yet.
Add a migration that adds this index concurrently, but only for PostgreSQL. I believe we can set the operator class using the opclasses: option
Ensure the index is created when setting up a new DB. This is done by loading the migration into lib/tasks/migrate/setup_postgresql.rake and manually migrating it
For MySQL we have a hack that ignores the opclasses: option, so everything else should just work
This is where the slow query logs might come in handy. We should pull up the pgbadger output in a graphical format so everyone can see the biggest culprits.
@stanhu Ironically I ran pgbadger earlier today, and it did not spit out any routes queries. Instead most of the queries where queries involving project_authorizations and massively nested sub-queries/unions.
One more problem I see there(probably minor) is that we allow "_" in a path. If you use it in LIKE the index won't be used as it's a special character "any character".
Looking at the 99th percentile of SQL timings there is a drop, but it's too early to tell if this is related or just coincidence; timings over the past 24 hours go up and down in waves.
:vertical_traffic_light::vertical_traffic_light::vertical_traffic_light: Once we properly solve and deploy this we need to remove the index index_routes_on_path_text_pattern_ops. The migration will re-use this index name and check for existence, removing the need for this.
Also, I'm moving this to 9.0 since we need this fixed in 9.0 as this is a pretty serious regression.