Skip to content

Add a GitLab Geo download scheduler for LFS files

Stan Hu requested to merge geo/lfs-download-scheduler into geo/backfilling

The download scheduler works as follows:

  1. Load a batch of IDs that we need to download from the primary (DB_RETRIEVE_BATCH) into a pending list.
  2. Schedule them so that at most MAX_CONCURRENT_DOWNLOADS are running at once.
  3. When a slot frees, schedule another download.
  4. When we have drained the pending list, load another batch into memory, and schedule the remaining files, excluding ones in progress.
  5. Quit when we have scheduled all downloads or exceeded an hour.

This builds on top of @dbalexandre's work in the geo/backfilling branch (!1197 (merged)):

Merge request reports