Precalculate authorized projects and store these in the database
Currently we calculate the projects one has access to on the fly. On the average page this data can be requested many times. The query for this is non trivial and can take quite a bit of time. We can alleviate this by storing the list of authorized projects in a table and refreshing this whenever a user's list of authorized projects changes.
Table Structure
The table name would be authorized_projects
. The structure for this table would be as follows:
-
id
(primary key) -
user_id
integer -
project_id
integer -
access_level
integer
The following indexes would be present:
-
(user_id, project_id)
UNIQUE
The following foreign keys would be present:
-
user_id
pointing tousers.id
, with a cascading delete -
project_id
pointing toprojects.id
, with a cascading delete
Here the cascading delete is put in place to ensure that rows are removed whenever their corresponding rows in users
and/or projects
are removed. This way we don't end up with orphaned data.
The access level is stored so the code can filter out projects where you don't have a certain access level.
Refreshing
In general the procedure for refreshing this table is quite simple. Whenever you are granted access to a project or access has been removed we need to recalculate the projects for the user. This process works as follows:
- Start a transaction with "serializable" isolation (
transaction(isolation: :serializable) do ... end
) - Delete the existing entries in the
authorized_projects
table for a given user - Run a query to get the up to date list of authorized projects
- Insert this data into the
authorized_projects
table - Commit the transaction
If the transaction fails due to a serialization error (this raises ActiveRecord::StatementInvalid
) this means somebody else tries to update the data for this user. In this case the procedure should be tried again until it succeeds. Since we always use the "live set" of authorized projects we'll eventually succeed in populating the table with the correct list of projects. This can be implemented using something like:
loop do
transaction(isolation: :serializable) do
begin
update data here
break
rescue ActiveRecord::StatementInvalid
end
end
end
Updating of the authorized projects should be handled by a Sidekiq worker. This worker simply takes the ID of a user for which to update the projects, then goes about its business. This is important because:
- Getting the authorized projects can be slow
- This allows us to re-use the code for cases such as the one described below
It's OK if it takes a few minutes to grant a user access to a project.
Now for the tricky bit. EE allows one to share projects with groups. Whenever projects are added in this setup we don't have a specific user for which we need to change the data, instead we have a large list of users. In this case we would need to iterate over all group members and run the above setup for every user in the group. This is a rather expensive process so we re-use the above worker. For every user we schedule a job using this worker, allowing this data to be calculated in parallel. This essentially leads to the following pseudo code:
group.share_project(project)
group.members.pluck(:id).each do |member_id|
AuthorizedProjectsWorker.perform(member_id)
end
Application Usage
For the most part we can fairly easily use this new setup. To do so we'd have to rename User#authorized_projects
to User#raw_authorized_projects
, then implement User#authorized_projects
as a method. This method by default calls super
(the Rails association). When an access level is specified it applies something like where(access_level: access_level)
to the result of super
. This keeps the method backwards compatible while still being able to benefit from the above setup.
For a similar setup see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/6749.