Elasticsearch: split "repository" documents into separate top-level "commit" and "blob" types
The following discussion from !2709 (merged) should be addressed:
-
@smcgivern started a discussion: I see we have talked about splitting the types in the issue, which makes sense to me: https://gitlab.com/gitlab-org/gitlab-ee/issues/3011#note_37888885
@vsizov @nick.thomas do we already have an issue for that?
Per https://gitlab.com/gitlab-org/gitlab-ee/issues/3011#note_37888885 , currently we store 'commits' and 'blobs' in elasticsearch with a _type
of repository
. This means commits have all the fields of blobs, and vice-versa. It also complicates querying these document types, and causes bugs.
Can we do this with a data migration? To me, asking our users to reindex all their repositories for this is unreasonable.
A thought: how much extra space do these fields take up per document, even though they're empty?