Skip to content

WIP: Rewrite the GitHub importer to perform work in parallel and greatly improve performance

yorickpeterse-staging requested to merge github-importer-refactor into master

What does this MR do?

This MR rewrites the GitHub importer from scratch so it's much faster and performs work in parallel. This MR is still a WIP, I'll update the body properly once we get closer to a final state.


  • Use a separate class for importing issue comments
  • Import issue comments using a worker (1 job per comment)
  • Issue and comment workers should reschedule themselves in the future if we hit a rate limit, the reschedule time will simply be the reset time of the rate limit (so if our rate limit resets in 10 seconds that means we schedule jobs for 10 seconds in the future)
  • Make the parallel importing schedule a job to check for progress, instead of blocking the thread in a sleep call
  • Make sure any stuck importer jobs don't mess with a running GitHub import
  • Import releases
  • Test all the things
  • Add tests for User.by_any_email

Does this MR meet the acceptance criteria?

Edited by yorickpeterse-staging

Merge request reports