Add foreign keys to various tables that point to the "projects" table
This adds foreign keys to various tables that have project_id
columns referring to the projects
table. All these foreign keys have a ON DELETE CASCADE
clause set, making it much easier and faster to remove data associated with a project (while also enforcing consistency). The MR includes a rather big migration to do all of this without requiring downtime and while making sure no orphaned data exists.
Some assocations are still removed by Rails. For example, LFS objects are still removed one by one as for every row we also need to remove data on the file system and there's no easy way of doing this in bulk. The same applies to CI artifacts and traces, which need to be migrated directory wise first (taken care of in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/11641).
The EE version of this MR (to deal with EE code such as ElasticSearch) can be found here: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/2223
Related issues/MRs:
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/6292
- https://gitlab.com/gitlab-org/gitlab-ce/issues/27998
Migration Timings
Migration | Time on Staging |
---|---|
ProjectForeignKeysWithCascadingDeletes | 60 minutes at least |
CorrectProtectedBranchesForeignKeys | 1.6 seconds |
AddForeignKeyForMergeRequestDiffs | 60 seconds |
The migration ProjectForeignKeysWithCascadingDeletes
had to be run 3 times as
the first time it did not take care of orphans in the
protected_branch_push_access_levels
table, leading to it failing when it tried
to remove orphans from protected_branches
. The second time it failed because a
table had orphans again that were added after the last removal. The 3rd time it
took 30 minutes to complete.
Merge request reports
Activity
- Resolved by yorickpeterse-staging
mentioned in merge request !8196 (merged)
added 1571 commits
-
2a31c799...32da7602 - 1570 commits from branch
master
- fe75bd91 - Add many foreign keys to the projects table
-
2a31c799...32da7602 - 1570 commits from branch
- Resolved by yorickpeterse-staging
changed milestone to %9.1
added 1779 commits
-
e10d7c1a...aa8260d2 - 1778 commits from branch
master
- 3f4d722c - Add many foreign keys to the projects table
-
e10d7c1a...aa8260d2 - 1778 commits from branch
mentioned in commit 47237ad7
mentioned in commit 1b620a4d
mentioned in commit 962bf01e
mentioned in issue #31259 (closed)
changed milestone to %9.2
added availability ~18308 labels
To recap, this is currently blocked by CI builds having data on the file system in multiple places. To allow PostgreSQL to remove
ci_builds
rows we need to be able to remove these build files (e.g. traces) without having to rely on DB rows (as these are removed at this point). The easiest way to do so is to store all these files in a directory scoped per project ID, that way we can just nuke the entire directory in one go. This however requires that we first move all existing files into the right place.Still removing CI builds one by one in Rails is not an option as this can still have a negative impact on both performance and availability (e.g. try removing 20 000 rows that way).
In other words, we need a file structure that looks like this:
shared/ ci/ 123/ artifacts/ traces/ kittens/ 13083/ artifacts/ 123.txt traces/ 456.txt kittens/
In this setup removing the files is just a matter of
rm -rf shared/ci/13083
with13083
being the ID of the project to remove.Looking at the code and @ayufan's suggestion above I think we're not blocked, instead we can use globs and run something like this:
rm -rf shared/artifacts/*/project-id-or-ci-id-here rm -rf shared/builds/*/project-id-or-ci-id-here
This should take care of removing the trace data, the artifacts file, and the artifacts metadata.
Edited by yorickpeterse-stagingmentioned in issue gitlab-com/infrastructure#1732 (closed)