Currently in order to check artifacts size we need to check the filesystem.
I think it is wise to store this information in database.
It will simplify us any statistics/metrics/calculation for projects.
It will allow us to implement the per-project quota for artifacts if needed.
It will allow us to have a soft monitoring of over usage of artifacts.
@ayufan As far as I remember, this was an initial plan when we designed artifacts browser, but argument against that was that it would be wise to create a separate table for artifacts (and we didn't want to do that). Currently we have some fields in CI build that in my opinion justify creating a separate table. We have reference to artifacts archive, we have reference to artifacts metadata, artifacts expiration date etc. But yet not every build has artifacts. If we are going to store artifacts size also, I think that we should create a separate table in database. What do you think?
@grzesiek I'm not sure. It would justify that if we were to have a multiple artifacts per-build. This concept were somehow evaluated by @markpundsack, but I'm still not confident that we need to introduce this level of complexity.
From other side it makes sense to split all artifacts handling code to separate model to simplify implementation. So maybe your proposal makes a sense.
However looking from the implementation perspective:
adding a database column is half day work and adds exactly what we need and doesn't introduce additional complexity,
adding a separate table, migrate all data, refactor all dependent code is 1wk for something that we may need in the future,
@ayufan I agree that it would be easier to add a column. But is this a good solution since our current schema violates database normalization principles, especially the third normal form related to how we handle data about artifacts in ci_builds table?
I like this idea. I'd just have a bias towards doing the simplest change now, and refactoring if/when needed later. For me, there's enough uncertainty around how we want artifacts to work in the future that refactoring now seems premature.
@ayufan@grzesiek So I just want to clarify, the artifacts size is already in artifacts_metadata, and we want to save it in database? And the choice would be just one new column in ci_builds, or a new table like ci_artifacts and ci_artifacts.ci_build_id to refer which build they belong?
I think we could still begin with a new table ci_artifacts, and move bits one by one. For now, we move size, and in the future we could move more on the way. This way, we could begin from a better approach yet deliver quick. What do you think?
The artifacts size you should get from the artifacts_file size.
I generally agree with adding the ci_artifacts, but I don't see benefit of adding only size to it.
If we want to go this route we should put everything and migrate to new table.
For now I would say that right now is good enough to store artifacts_size as a column of ci_builds.
@ayufan I still feel it would be beneficial if we just start using the right approach as it would leave a skeleton there so that future us would know where we're heading to, and I don't feel it would take so much time just because we're moving codes around (I love moving codes around, to be honest). However since I am not familiar with current codes so I'll simply follow your advice for now :)
I'll also let you know if I discover something during the process. Thanks!
I feel the same. I'm ok with improving that in next iteration when we will know that migration to separate table will allow us to cover more use-cases, but first we need to define this use-cases :)