Skip to content

Resolve "GitHub import should fetch 100 results per page to limit the change to hit rate limiting"

username-removed-128633 requested to merge 38198-fetch-github-api-per-100 into master

What does this MR do?

See commits for details.

When importing https://github.com/guard/guard/pulls?q=is%3Apr+sort%3Acreated-desc+is%3Aopen, before the change:

Fetched pull requests in  93.480000  12.430000 117.930000 (1051.885330)
Fetched issues in  62.570000   2.390000  64.960000 (1142.657154)
Import finished. Timings: 158.910000  15.060000 186.280000 (2208.732847)

After the change:

Fetched pull requests in  89.090000  12.090000 113.160000 (698.703639)
Fetched issues in  65.940000   2.540000  68.480000 (1134.679727)
Import finished. Timings: 158.230000  14.830000 185.350000 (1843.907216)

That's a 17% improvement!

Are there points in the code the reviewer needs to double check?

From https://gitlab.com/gitlab-org/gitlab-ce/issues/38198#note_40975277:

We do need to check that this does not increase memory usage much, though.

Locally the memory went from 302 MB to 441 MB at most with this change. Before the change, the memory went from 301 MB to 432 MB at most.

Why was this MR needed?

To improve the performance of the GitHub import.

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #38198 (closed) and implements first (simplest) solution of #38200.

Edited by username-removed-128633

Merge request reports