Skip to content
Snippets Groups Projects

Add internal upload to external storage

Merged Kamil Trzcińśki requested to merge add-inline-upload-to-remote-storage into master
1 unresolved thread

This adds inline uploading capability to multipart handler.

API from GitLab exposes two pre-signed URLs (for S3 upload): 1. to upload the file, 2. to delete the file after the operation.

GitLab would generate these URLs with authorizing. One upload finish GitLab would handle these URLs and "COPY" file using S3 API to the valid path.

We still save file to disk in order to generate metadata.

It is to solve this problem: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9860#note_26534204.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Maintainer

    Where exactly is the preauth API request made?

    It is not yet made. It will in next MR on top of: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9860

  • added 3 commits

    Compare with previous version

  • Kamil Trzcińśki resolved all discussions

    resolved all discussions

  • Author Maintainer

    @jacobvosmaer-gitlab @nick.thomas

    I had to change upload process to be after download as we do not know Content-Length at this moment and S3 doesn't accept requests without content-lengths.

    It makes it less robust, but allows us in the future to use multipart uploads.

  • Kamil Trzcińśki marked as a Work In Progress

    marked as a Work In Progress

  • Author Maintainer

    @jacobvosmaer-gitlab We will finish that for %9.2.

  • Kamil Trzcińśki changed milestone to %9.2

    changed milestone to %9.2

  • Kamil Trzcińśki added 49 commits

    added 49 commits

    Compare with previous version

  • Looks nice @ayufan . What about tests?

  • Author Maintainer

    Will be done in about 2h.

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Kamil Trzcińśki unmarked as a Work In Progress

    unmarked as a Work In Progress

  • added 1 commit

    Compare with previous version

  • added 2 commits

    • c1c610de - Revert objectStoreTimeout option
    • ee736e9d - Introduce Timeout support, but received with API call

    Compare with previous version

  • added 1 commit

    • 553f2efc - Fix body testing for uploader

    Compare with previous version

  • Apart from that one test problem (I wonder if that is why the MR is red) this looks good to me. @ayufan please assign to Nick on the next round.

    @nick.thomas am I supposed to click the 'Approve' button now? :)

  • Author Maintainer

    @jacobvosmaer-gitlab Test is fixed.

  • Jacob Vosmaer (GitLab) resolved all discussions

    resolved all discussions

  • added 1 commit

    Compare with previous version

  • @jacobvosmaer-gitlab yep, ideally the approver would always be different to the merger. So if you hit approve, and I merge, all is good.

    Of course, if I request changes, your approval disappears automatically on push...

  • Nick Thomas resolved all discussions

    resolved all discussions

  • Nick Thomas
  • Nick Thomas
  • 51 if err := zipMd.Wait(); err != nil {
    52 if st, ok := helper.ExitStatus(err); ok && st == zipartifacts.StatusNotZip {
    53 return nil
    54 }
    55 return err
    70
    71 if generatedMetadata {
    72 // Pass metadata file path to Rails
    73 writer.WriteField("metadata.path", a.metadataFile)
    74 writer.WriteField("metadata.name", "metadata.gz")
    56 75 }
    57 76
    58 // Pass metadata file path to Rails
    59 writer.WriteField("metadata.path", a.metadataFile)
    60 writer.WriteField("metadata.name", "metadata.gz")
    77 if err := a.storeFile(formName, fileName, writer); err != nil {
    • I do have a problem with this approach in general, which is that it can double the amount of time an artifact upload takes, and we already have complaints about that.

      The bytes must be streamed from the CI runner to gitlab, and then from gitlab to the object store, before the job is considered complete.

      An alternative implementation would be to pass the upload URL all the way to the runner. It would stream the bytes directly to the object store. When the runner tells gitlab the job is completed, it could then enqueue a sidekiq job to asynchronously download the ZIP file from the store and create the metadata.

      It's a big rethink of the architecture, so maybe we can consider it as a second iteration?

      Edited by Nick Thomas
    • Author Maintainer

      @nick.thomas

      I agree with you. I even tried to do that now, but I was hit by missing Content-Length thingy.

      Yes. This is next iteration. It does require changes on Runner side, so it cannot be done "now" as this would require compatible runner too.

      Having that split, and then exposing API for getting /authorize out of Rails, the same way as Workhorse gets it, makes it possible to do it fully asynchronous.

      But still having support for older versions for some time. Otherwise we would be also hit by synchronous CarrierWave upload.

      Edited by Kamil Trzcińśki
    • Author Maintainer

      Thanks that you did ask about that :)

    • OK, can you get this into a new issue as a future enhancement, and make sure a note on the performance is included in the docs, wherever they are?

    • Author Maintainer

      Yes. I will make sure about that.

    • Author Maintainer

      One thing to note is that this implementation is best used with:

      • object-storage local to server, in the same zone,
      • this allows to avoid extra spending on egress traffic,
      • and reduces the amount of time needed to do post-upload store in object storage.

      cc @axil

    • I'd suggest we not use this on GitLab.com until the second iteration ;)

    • Author Maintainer

      @nick.thomas We will have a test drive on dev before enabling that. So, belive me, it is 2-3 months process :)

    • I thought the point of this change was to avoid running out of NFS server disk space for artifact uploads. Isn't that orthogonal to how fast the uploads are?

    • Please register or sign in to reply
  • assigned to @ayufan

  • added 1 commit

    Compare with previous version

  • Nick Thomas approved this merge request

    approved this merge request

  • Nick Thomas enabled an automatic merge when the pipeline for 65215ee6 succeeds

    enabled an automatic merge when the pipeline for 65215ee6 succeeds

  • OK, LGTM :thumbsup:

  • Nick Thomas mentioned in commit 1390af67

    mentioned in commit 1390af67

  • merged

  • Please register or sign in to reply
    Loading