WIP: Rewrite the GitHub importer to perform work in parallel and greatly improve performance
What does this MR do?
This MR rewrites the GitHub importer from scratch so it's much faster and performs work in parallel. This MR is still a WIP, I'll update the body properly once we get closer to a final state.
TODO
-
Use a separate class for importing issue comments -
Import issue comments using a worker (1 job per comment) -
Issue and comment workers should reschedule themselves in the future if we hit a rate limit, the reschedule time will simply be the reset time of the rate limit (so if our rate limit resets in 10 seconds that means we schedule jobs for 10 seconds in the future) -
Make the parallel importing schedule a job to check for progress, instead of blocking the thread in a sleep
call -
Make sure any stuck importer jobs don't mess with a running GitHub import -
Import releases -
Test all the things -
Add tests for User.by_any_email
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together
Edited by yorickpeterse-staging