Add internal upload to external storage
This adds inline uploading capability to multipart handler.
API from GitLab exposes two pre-signed URLs (for S3 upload): 1. to upload the file, 2. to delete the file after the operation.
GitLab would generate these URLs with authorizing. One upload finish GitLab would handle these URLs and "COPY" file using S3 API to the valid path.
We still save file to disk in order to generate metadata.
It is to solve this problem: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9860#note_26534204.
Merge request reports
Activity
assigned to @jacobvosmaer-gitlab
@ayufan this makes sense to me, uploads are slow, workhorse can do them. Not sure what I think about io.Pipe() but I see how it is a quick solution to be able to present an io.Reader to the upload POST.
Upload parameters should go into a sub-struct instead of loose fields in the API response struct.
Where exactly is the preauth API request made?
- Resolved by Kamil Trzcińśki
assigned to @ayufan
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Kamil Trzcińśki
Where exactly is the preauth API request made?
It is not yet made. It will in next MR on top of: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9860
@jacobvosmaer-gitlab @nick.thomas
I had to change upload process to be after download as we do not know
Content-Length
at this moment and S3 doesn't accept requests without content-lengths.It makes it less robust, but allows us in the future to use multipart uploads.
assigned to @jacobvosmaer-gitlab
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
assigned to @ayufan
@jacobvosmaer-gitlab We will finish that for %9.2.
changed milestone to %9.2
added 49 commits
Toggle commit listassigned to @jacobvosmaer-gitlab
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
Looks nice @ayufan . What about tests?
assigned to @ayufan
assigned to @jacobvosmaer-gitlab
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
- Resolved by Jacob Vosmaer (GitLab)
assigned to @ayufan
assigned to @jacobvosmaer-gitlab
Apart from that one test problem (I wonder if that is why the MR is red) this looks good to me. @ayufan please assign to Nick on the next round.
@nick.thomas am I supposed to click the 'Approve' button now? :)
@jacobvosmaer-gitlab Test is fixed.
assigned to @nick.thomas
@jacobvosmaer-gitlab yep, ideally the approver would always be different to the merger. So if you hit approve, and I merge, all is good.
Of course, if I request changes, your approval disappears automatically on push...
- Resolved by Nick Thomas
- Resolved by Kamil Trzcińśki
- Resolved by Nick Thomas
- Resolved by Nick Thomas
- Resolved by Nick Thomas
51 if err := zipMd.Wait(); err != nil { 52 if st, ok := helper.ExitStatus(err); ok && st == zipartifacts.StatusNotZip { 53 return nil 54 } 55 return err 70 71 if generatedMetadata { 72 // Pass metadata file path to Rails 73 writer.WriteField("metadata.path", a.metadataFile) 74 writer.WriteField("metadata.name", "metadata.gz") 56 75 } 57 76 58 // Pass metadata file path to Rails 59 writer.WriteField("metadata.path", a.metadataFile) 60 writer.WriteField("metadata.name", "metadata.gz") 77 if err := a.storeFile(formName, fileName, writer); err != nil { I do have a problem with this approach in general, which is that it can double the amount of time an artifact upload takes, and we already have complaints about that.
The bytes must be streamed from the CI runner to gitlab, and then from gitlab to the object store, before the job is considered complete.
An alternative implementation would be to pass the upload URL all the way to the runner. It would stream the bytes directly to the object store. When the runner tells gitlab the job is completed, it could then enqueue a sidekiq job to asynchronously download the ZIP file from the store and create the metadata.
It's a big rethink of the architecture, so maybe we can consider it as a second iteration?
Edited by Nick ThomasI agree with you. I even tried to do that now, but I was hit by missing
Content-Length
thingy.Yes. This is next iteration. It does require changes on Runner side, so it cannot be done "now" as this would require compatible runner too.
Having that split, and then exposing API for getting
/authorize
out of Rails, the same way as Workhorse gets it, makes it possible to do it fully asynchronous.But still having support for older versions for some time. Otherwise we would be also hit by synchronous CarrierWave upload.
Edited by Kamil TrzcińśkiOne thing to note is that this implementation is best used with:
- object-storage local to server, in the same zone,
- this allows to avoid extra spending on egress traffic,
- and reduces the amount of time needed to do post-upload store in object storage.
cc @axil
@nick.thomas We will have a test drive on dev before enabling that. So, belive me, it is 2-3 months process :)
assigned to @ayufan
enabled an automatic merge when the pipeline for 65215ee6 succeeds
mentioned in commit 1390af67