Skip to content
Snippets Groups Projects

Refactoring rake task to import GitHub repositories

Merged username-removed-283999 requested to merge fix/github-importer into master
All threads resolved!

What does this MR do?

  • Add client for the GitHub API
  • Fix memory leak on GitHub importer
  • Fix error when listing pull requests/issues comments for large repositories
  • Refactor current rake task to use the new client

Next steps - %9.2

  • Refactor the GitHub importer to use the new GitHub client
  • Remove the octokit gem
  • Remove old code

Are there points in the code the reviewer needs to double check?

The Github::Import class.

Does this MR meet the acceptance criteria?

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • James Lopez
  • @dbalexandre wow a lot of improvements here! Thanks for this! I've left a few comments/questions and also another comment here https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10695#note_27867907

    The only thing that makes me a bit nervous is that there are no specs at all giving the amount of new code/classes. But if this works fine as a temporary solution for https://gitlab.com/gitlab-com/infrastructure/issues/1539 that could do :)

  • @dbalexandre looks like @jameslopez already did a great first review; I just added some small comments :slight_smile:

  • @dbalexandre does this replace octokit? Wondering if you should make a gem :thinking:

    @jameslopez The plan is to replace the octokit gem on %9.2. I'm not sure if it's worth to extract a gem. This code is very specific for our needs. I also think this could make the process to improve the importer slower since we need to bump the gem version every time. Wdyt?

    Have you tried it by importing the project again on your DO instance? That would be the perfect test - so we know we won't encounter any issues with these changes :)

    I didn't have time to try this on my DO instance yet, but I left it running locally for few hours. I also imported some repositories that we have on GitHub with edge cases.

  • added 275 commits

    • 35812a4c...1884f650 - 238 commits from branch master
    • 7030eb32 - Add faraday gem
    • fc42f3df - Add basic client for the GitHub API
    • b43ecca9 - Add basic representations for the Github API results
    • f76363e4 - Add basic importer for GitHub repositories
    • 9bcb1b27 - Remove unused gems
    • 4f68dc4f - Bump oj gem to version 2.17.5
    • 0a52ae83 - Remove unused GitHub endpoint wrappers
    • 0b1d1931 - Add issue representation
    • 2c92cc52 - Add comment representation
    • 8538066e - Refactoring collection wrapper
    • 104144f3 - Refactoring client to not parse response body automatically
    • 4a3b895d - Refactoring Github import to avoid memory leak
    • 00912ed9 - Refactoring Github response
    • eb95f0e5 - Fix comment representation
    • c2607666 - Import pull requests comments
    • db322009 - Import issues comments
    • 33c8f315 - Apply labels to issues/merge requests
    • a32adb82 - Remove sensitive information
    • f35573f1 - Extract common attributes to Github::Representation::Base
    • 00e3d60c - Extract Github::Representation::Issuable
    • ac1634fa - Extract a method to import issues/pull requests comments
    • 782aab13 - Pass a options hash to Github::Client
    • 5691c9b0 - Does not remove the GitHub remote
    • 18144530 - Add a method to format issues/pull requests/comments body
    • a7cb336e - Use while instead of loop/break
    • 3aa89795 - Refactoring Github import
    • 3c0a713a - Import Github releases
    • bd9e5c5d - Clone GitHub wiki
    • 09a6d328 - Keep track of import errors
    • e50606cd - Refactor rake task to to import GitHub repositores
    • 2b7328c3 - Rename find_by_email/find_by_external_uid methods
    • 9bdde579 - Add Github::Representation::Base#id
    • c7935dcf - Does not freeze integer values
    • f73a0280 - Extract rate limit URL to a constant
    • 275f00ee - Refactoring Github::RateLimit
    • 1f498b73 - Use only one cache hash with with a hash initializer by default
    • 5c72ba0f - Finish the import process if some error occurs when fetching the repo

    Compare with previous version

  • added 1 commit

    • 22a33d82 - Avoid unnecessary use of Arel to find users by external uid

    Compare with previous version

  • added 1 commit

    • 05255631 - Cache labels at the same time we fetch them from the GH API

    Compare with previous version

  • added 2 commits

    • 5d106f25 - Use the base initiliazer for representations
    • 30794972 - Set timeout options to the Github::Client

    Compare with previous version

  • @jameslopez @smcgivern Thanks for the feedback. I made some proposed changes/commented where needed. Could you have a look?

  • added 2 commits

    • dd1157c8 - Use Class.new(SuperClass) to define an empty custom error class
    • aeb1684c - Fix small typo

    Compare with previous version

  • James Lopez
  • thanks @dbalexandre :heart:

    over to you @smcgivern

  • assigned to @smcgivern

  • @dbalexandre thanks! For rolling this out, do you think we should do a full test on your DO instance first, or are you happy for this to be in a patch release without having run through the full import in one go?

    Please also resolve discussions or move them to a new issue as appropriate:

    image

    :slight_smile:

  • @smcgivern @jameslopez Thanks for the review :hearts:

    For rolling this out, do you think we should do a full test on your DO instance first, or are you happy for this to be in a patch release without having run through the full import in one go?

    @smcgivern I have some discussions that I need to resolve today, then I'll run it for a few hours on my DO instance just to make sure that everything is ok.

  • @dbalexandre thanks, makes sense to me!

  • added 2 commits

    • 7df974c4 - Add blank line before the raise method on Github::Collection
    • 2f934ce2 - Remove the Github::Error base class

    Compare with previous version

  • added 3 commits

    • 000a723d - Fix small typo on GitHub::Import
    • 39ab842b - Create project repository only when it not exists
    • 44954c50 - Fix import of notes on Pull Request diff

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 2 commits

    • f184cb43 - Fix undefined attribute params from import task
    • d082b789 - Add basic progress output to GitHub import

    Compare with previous version

  • username-removed-283999 resolved all discussions

    resolved all discussions

  • mentioned in issue #31425 (moved)

  • @jameslopez @smcgivern I left the importer running for a few hours on my DO instance. We can see some interesting numbers related to the memory usage on the screenshot:

    1

    @smcgivern Can you do a final review, please?

  • username-removed-443319 approved this merge request

    approved this merge request

  • mentioned in commit d255425b

  • @godfat @felipe_artur I know this is large, but we'd love to have this in a 9.1 patch if possible, as per gitlab-com/infrastructure#1539.

  • @smcgivern I don't see why not, but I am not sure when it should happen. Before or after security release? https://gitlab.com/gitlab-org/gitlab-ce/issues/31341#note_28296520

  • @godfat if you need to do a security release first, do that first :slight_smile:

  • Picked into 9-1-stable, will go into 9.1.2

  • mentioned in commit f886c6f7

  • mentioned in issue #27429 (moved)

  • mentioned in issue gitlab#9463

  • Please register or sign in to reply
    Loading