Comments retrieval could be done a lot smarter during GitHub import
Currently, when importing a GitHub pull-request, we:
- Fetch the Pull Requests endpoint: https://gitlab.com/gitlab-org/gitlab-ce/blob/598b1a17a0442f0038e208f1abfc1198112282f3/lib/github/import.rb#L171
- Fetch the Review Comments endpoint: https://gitlab.com/gitlab-org/gitlab-ce/blob/598b1a17a0442f0038e208f1abfc1198112282f3/lib/github/import.rb#L205-207
- Fetch the Issue Comments endpoint: https://gitlab.com/gitlab-org/gitlab-ce/blob/598b1a17a0442f0038e208f1abfc1198112282f3/lib/github/import.rb#L209-211
There is a lot of possible improvements here:
- The Issue endpoint returns a
comments
field that can be used to checked if the issue (or PR since PR are issues) has any comments. That means that instead of unconditionally fetch comments for PRs, we can conditionally fetch for comments in#populate_issue
(since we fetch PRs too with this method). We already perform this check for issues: https://gitlab.com/gitlab-org/gitlab-ce/blob/598b1a17a0442f0038e208f1abfc1198112282f3/lib/github/import.rb#L267-270 - That said, we could even go further and fetch all the comments from the repo and loop over them: https://developer.github.com/v3/issues/comments/#list-comments-in-a-repository. That way we would avoid a lot of individual calls for comments. Imagine you have 1000 issues with 1 comment, you currently need to do 1000 requests to https://developer.github.com/v3/issues/comments/#list-comments-on-an-issue, but if you call https://developer.github.com/v3/issues/comments/#list-comments-in-a-repository instead, you only need 1000 /
per_page
, which would be 10 calls forper_page=100
! - In the same idea as the point above, we could fetch https://developer.github.com/v3/pulls/comments/#list-comments-in-a-repository to get all the Review Comments directly, and the benefit would be even greater here since we cannot check in advance if the PR has review comments. Imagine you have 1000 PRs, but only 300 have comments, you currently need to do 1000 requests to https://developer.github.com/v3/pulls/comments/#list-comments-on-a-pull-request, but if you call https://developer.github.com/v3/pulls/comments/#list-comments-in-a-repository instead, you only need 300 /
per_page
, which would be 3 calls forper_page=100
!
Note:
- To get the type (
issue
/merge_request
) and IID with the https://developer.github.com/v3/issues/comments/#list-comments-in-a-repository endpoint, we'll have to parse"html_url": "https://github.com/octocat/Hello-World/issues/1347#issuecomment-1"
since there's noissuable_type
/issuable_id
field. - To get the type IID with the https://developer.github.com/v3/pulls/comments/#list-comments-in-a-repository endpoint, we'll have to parse
"html_url": "https://github.com/octocat/Hello-World/pull/1#discussion-diff-1"
since there's noissuable_id
field.
/cc @jameslopez @godfat