Many indexes are unused according to PostgreSQL and may be removed
One can run the following query to get an overview of all indexes and some usage statistics (see https://www.postgresql.org/docs/9.2/static/monitoring-stats.html section "Table 27-6. pg_stat_all_indexes View"):
SELECT relname as table_name, indexrelname as index_name, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelname::regclass))
FROM pg_stat_all_indexes
WHERE schemaname = 'public'
AND "idx_scan" = 0
ORDER BY pg_relation_size(indexrelname::regclass) desc;
This outputs the contents of log.txt
Some of these indexes are quite weird. For example, the index index_services_on_created_at_and_id
is never used and indexes services
on (created_at, id)
. Since id
is already indexed (it being a primary key) the combination is completely useless. At most this should be a regular single column index, if it's needed at all.
Indexes To Verify
Some of these may be EE specific, this list also excludes primary key indexes:
-
ci_taggings_idx: the entire ci_taggings
table appears to be unused, so in theory the whole table can be removed -
index_audit_events_on_author_id: used by AuditEvent.user
, but since this runs a separate query this would not use the index -
index_audit_events_on_type: the column in question does not appear to be used for queries, can be removed -
index_ci_builds_on_erased_by_id: column not used for querying, can be removed -
index_ci_builds_on_project_id_and_commit_id: columns are already indexed individually, can be removed -
index_ci_builds_on_type: column does not appear to be used, can be removed -
index_ci_commits_on_project_id: not used in any queries, can be removed -
index_ci_commits_on_project_id_and_committed_at: not used, can be removed -
index_ci_commits_on_project_id_and_committed_at_and_id: not used, can be removed -
index_ci_commits_on_project_id_and_sha: not used, can be removed -
index_ci_commits_on_sha: not used, can be removed -
index_ci_events_on_created_at: table not in use, can be removed -
index_ci_events_on_is_admin: table not in use, can be removed -
index_ci_events_on_project_id: table not in use, can be removed -
index_ci_jobs_on_deleted_at: unused, can be removed -
index_ci_jobs_on_project_id: unused, can be removed -
index_ci_projects_on_gitlab_id: unused, can be removed -
index_ci_projects_on_shared_runners_enabled: only used in a query that instead uses a composite index, can be removed -
index_ci_services_on_project_id: table appears to be unused, can be removed -
index_ci_sessions_on_session_id: table appears to be unused, can be removed -
index_ci_sessions_on_updated_at: table appears to be unused, can be removed -
index_ci_taggings_on_taggable_id_and_taggable_type_and_context: table unused, can be removed -
index_ci_tags_on_name: table unused, can be removed -
index_ci_triggers_on_deleted_at: unused, can be removed -
index_identities_on_created_at_and_id: unused, can be removed -
index_issues_on_title: never used as title searches use %QUERY%
which never uses an index, can be removed -
index_keys_on_created_at_and_id: can be removed -
index_members_on_created_at_and_id: can be removed -
index_members_on_type: not used for filtering, can be removed -
index_merge_requests_on_deleted_at: can be removed -
index_milestones_on_created_at_and_id: can be removed -
index_namespaces_on_visibility_level: no filtering happens using namespaces.visibility_level
, can be removed -
index_projects_on_builds_enabled: can be removed -
index_projects_on_builds_enabled_and_shared_runners_enabled: unused due to other columns/indexes already filtering enough data -
index_services_on_category: never used as the query ( services.where(category: :ci)
) already filters enough data using theproject_id
index/column -
index_services_on_created_at_and_id: why does this even exist, can be removed -
index_services_on_default: not used due to project_id
already filtering enough data -
index_snippets_on_created_at: seriously...can be removed -
index_snippets_on_created_at_and_id: I guess if one index isn't enough we'll create two? can be removed -
index_todos_on_state: unused, most likely due to the filter for user_id
already filtering enough rows -
index_web_hooks_on_created_at_and_id: can be removed
EE specific:
-
index_geo_nodes_on_geo_node_key_id: not used in queries, can be removed -
index_geo_nodes_on_primary: used in lib/gitlab/geo.rb
in EE, may be used given enough rows ingeo_nodes
. Let's keep this one around -
index_namespaces_on_last_ldap_sync_at: column removed in an 8.11 migration