Speed up DeclarativePolicy::Runner#steps_by_score (!14679) · Merge requests · GitLab.org / GitLab FOSS

username-removed-443319 requested to merge declarative-policy-optimisations into master Oct 04, 2017

First, the setup:

https://gitlab.com/gitlab-org/gitlab-ce/issues/4058 has a lot of participants and subscribers (348). We spend a good chunk of the load time just figuring out which of those can view the issue.

(https://gitlab.com/gitlab-org/gitlab-ce/issues/23206, in 10.2, will make that async, but it won't stop being slow.)

To recreate this locally, I created an issue with a lot of participants, all from mentioning the issue in their own private project:

issue.participants.count
#=> 1387

Benchmark.realtime { 10.times { Ability.users_that_can_read_project(issue.participants, issue.project) } }
#=> 11.749618000001647

Looking at the line profile for the related issues issue, I can see that there's a lot of time going to DeclarativePolicy::Runner#steps_by_score:

The call counts are very high, as you can see. d7717192 changes this in two ways:

It performs fewer iterations through the outer loop by ignoring steps that would enable an already-enabled rule.
It stops iterating through the inner loop once we get to a free step (score of zero). This is less useless than it sounds, as in this case, a lot of steps have a zero score because they are cached, especially once we get to later users.

Locally, my call count just about halved to the outer loop due to these two changes.

The second thing I fixed, although it's much more minor, is this:

We actually don't need to compute cached_pass? for every single rule: once we get a non-true value, we can return that value.

In my local instance, cached_pass? was called 66,572 times in total before, and 'only' 9,713 after.

End result:

Benchmark.realtime { 10.times { Ability.users_that_can_read_project(issue.participants, issue.project) } }
#=> 9.623671000008471

Which is quite underwhelming. But hey, it's a start. Discarded ideas:

Memoise a condition's score, and explicitly set it to zero when running the condition. This doesn't work because multiple condition instances can share the same cache key, and all should be scored as zero once that happens.
Quit the re-sorting and re-scoring in steps_by_score and just do it once at the start. Turns out this is actually very useful!
Cache the runner steps globally, and just flatten them once for a single policy, as in this case we're running :read_project for a bunch of users. This doesn't work because there is a lot of implicit state bundled along in the steps themselves 😞
Just give up and optimise Ability.users_that_can_read_project explicitly for public projects. I don't want to do this because then we get into a whole dance about special cases that need to be kept in sync with the policies. I'd rather find a way to express a simplified 'batch mode' for policies like this, that goes beyond just subject scope.
Some ideas that were so dumb I won't even mention them!

Edited Oct 04, 2017 by username-removed-443319

Admin message

Admin message

Speed up DeclarativePolicy::Runner#steps_by_score

Merge request reports