Refactoring rake task to import GitHub repositories
What does this MR do?
- Add client for the GitHub API
- Fix memory leak on GitHub importer
- Fix error when listing pull requests/issues comments for large repositories
- Refactor current rake task to use the new client
%9.2
Next steps -- Refactor the GitHub importer to use the new GitHub client
- Remove the
octokit
gem - Remove old code
Are there points in the code the reviewer needs to double check?
The Github::Import
class.
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Branch has no merge conflicts with master
(if it does - rebase it please) -
Squashed related commits together
Merge request reports
Activity
added 286 commits
-
b11f9849...b638e45c - 262 commits from branch
master
- 8a67ae6d - Add faraday gem
- 534295ce - Add basic client for the GitHub API
- 3ae47c4d - Add basic representations for the Github API results
- f2de7718 - Add basic importer for GitHub repositories
- d61b8954 - Remove unused gems
- b68a096c - Bump oj gem to version 2.17.5
- 720b535c - Remove unused GitHub endpoint wrappers
- d97c5bdc - Add issue representation
- e864f8c5 - Add comment representation
- a779cd53 - Refactoring collection wrapper
- cab82f77 - Refactoring client to not parse response body automatically
- 853c2e13 - Refactoring Github import to avoid memory leak
- a0c29fe1 - Refactoring Github response
- 401394b2 - Fix comment representation
- 4811d400 - Import pull requests comments
- dcacb066 - Import issues comments
- 2c869557 - Apply labels to issues/merge requests
- 986be2bb - Remove sensitive information
- 4e7abc0c - Extract common attributes to Github::Representation::Base
- 48b907eb - Extract Github::Representation::Issuable
- 73613ed2 - Extract a method to import issues/pull requests comments
- af1de5c2 - Pass a options hash to Github::Client
- 35e6c67f - Does not remove the GitHub remote
- 0af96ec7 - Add a method to format issues/pull requests/comments body
Toggle commit list-
b11f9849...b638e45c - 262 commits from branch
@smcgivern @rspeicher @jameslopez Could you review this?
assigned to @smcgivern
changed milestone to %9.1
@dbalexandre Did you make sure to reset the credential from https://gitlab.com/gitlab-org/gitlab-ce/commit/b11f9849c25b7f01a4d9f6769dcc9a8d5a500184?
And is this really intended for %9.1? Seems like a big change to go in so late.
@rspeicher Yep. I revoked it when I noticed that I pushed it by mistake. I agree that is a big change, but we need this for https://gitlab.com/gitlab-com/infrastructure/issues/1539. Because of this that I just changed the rake task to use this new code. Maybe we can do a hot-patch in the instance that we'll for or release this in the next patch release for %9.1
Edited by username-removed-283999@dbalexandre does this replace octokit? Wondering if you should make a gem
Have you tried it by importing the project again on your DO instance? That would be the perfect test - so we know we won't encounter any issues with these changes :)
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by James Lopez
- Resolved by James Lopez
- Resolved by username-removed-283999
- Resolved by James Lopez
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
@dbalexandre wow a lot of improvements here! Thanks for this! I've left a few comments/questions and also another comment here https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10695#note_27867907
The only thing that makes me a bit nervous is that there are no specs at all giving the amount of new code/classes. But if this works fine as a temporary solution for https://gitlab.com/gitlab-com/infrastructure/issues/1539 that could do :)
- Resolved by username-removed-283999
- Resolved by username-removed-443319
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
- Resolved by username-removed-283999
@dbalexandre looks like @jameslopez already did a great first review; I just added some small comments
assigned to @dbalexandre
@dbalexandre does this replace octokit? Wondering if you should make a gem
@jameslopez The plan is to replace the octokit gem on %9.2. I'm not sure if it's worth to extract a gem. This code is very specific for our needs. I also think this could make the process to improve the importer slower since we need to bump the gem version every time. Wdyt?
Have you tried it by importing the project again on your DO instance? That would be the perfect test - so we know we won't encounter any issues with these changes :)
I didn't have time to try this on my DO instance yet, but I left it running locally for few hours. I also imported some repositories that we have on GitHub with edge cases.
added 275 commits
-
35812a4c...1884f650 - 238 commits from branch
master
- 7030eb32 - Add faraday gem
- fc42f3df - Add basic client for the GitHub API
- b43ecca9 - Add basic representations for the Github API results
- f76363e4 - Add basic importer for GitHub repositories
- 9bcb1b27 - Remove unused gems
- 4f68dc4f - Bump oj gem to version 2.17.5
- 0a52ae83 - Remove unused GitHub endpoint wrappers
- 0b1d1931 - Add issue representation
- 2c92cc52 - Add comment representation
- 8538066e - Refactoring collection wrapper
- 104144f3 - Refactoring client to not parse response body automatically
- 4a3b895d - Refactoring Github import to avoid memory leak
- 00912ed9 - Refactoring Github response
- eb95f0e5 - Fix comment representation
- c2607666 - Import pull requests comments
- db322009 - Import issues comments
- 33c8f315 - Apply labels to issues/merge requests
- a32adb82 - Remove sensitive information
- f35573f1 - Extract common attributes to Github::Representation::Base
- 00e3d60c - Extract Github::Representation::Issuable
- ac1634fa - Extract a method to import issues/pull requests comments
- 782aab13 - Pass a options hash to Github::Client
- 5691c9b0 - Does not remove the GitHub remote
- 18144530 - Add a method to format issues/pull requests/comments body
- a7cb336e - Use while instead of loop/break
- 3aa89795 - Refactoring Github import
- 3c0a713a - Import Github releases
- bd9e5c5d - Clone GitHub wiki
- 09a6d328 - Keep track of import errors
- e50606cd - Refactor rake task to to import GitHub repositores
- 2b7328c3 - Rename find_by_email/find_by_external_uid methods
- 9bdde579 - Add Github::Representation::Base#id
- c7935dcf - Does not freeze integer values
- f73a0280 - Extract rate limit URL to a constant
- 275f00ee - Refactoring Github::RateLimit
- 1f498b73 - Use only one cache hash with with a hash initializer by default
- 5c72ba0f - Finish the import process if some error occurs when fetching the repo
Toggle commit list-
35812a4c...1884f650 - 238 commits from branch
added 1 commit
- 22a33d82 - Avoid unnecessary use of Arel to find users by external uid
added 1 commit
- 05255631 - Cache labels at the same time we fetch them from the GH API
@jameslopez @smcgivern Thanks for the feedback. I made some proposed changes/commented where needed. Could you have a look?
- Resolved by username-removed-283999
- Resolved by username-removed-283999
thanks @dbalexandre
over to you @smcgivern
assigned to @smcgivern
- Resolved by username-removed-283999
@dbalexandre thanks! For rolling this out, do you think we should do a full test on your DO instance first, or are you happy for this to be in a patch release without having run through the full import in one go?
Please also resolve discussions or move them to a new issue as appropriate:
assigned to @dbalexandre
@smcgivern @jameslopez Thanks for the review
For rolling this out, do you think we should do a full test on your DO instance first, or are you happy for this to be in a patch release without having run through the full import in one go?
@smcgivern I have some discussions that I need to resolve today, then I'll run it for a few hours on my DO instance just to make sure that everything is ok.
@dbalexandre thanks, makes sense to me!
mentioned in issue #31425 (moved)
@jameslopez @smcgivern I left the importer running for a few hours on my DO instance. We can see some interesting numbers related to the memory usage on the screenshot:
@smcgivern Can you do a final review, please?
assigned to @smcgivern
mentioned in commit d255425b
@godfat @felipe_artur I know this is large, but we'd love to have this in a 9.1 patch if possible, as per gitlab-com/infrastructure#1539.
@smcgivern I don't see why not, but I am not sure when it should happen. Before or after security release? https://gitlab.com/gitlab-org/gitlab-ce/issues/31341#note_28296520
@godfat if you need to do a security release first, do that first
mentioned in commit f886c6f7
mentioned in issue #27429 (moved)
mentioned in issue gitlab#9463