Move from short-lived indexer process to long-lived indexing daemon
In the current (9.0, 9.1) architecture, multiple instances of bin/elastic_repo_indexer
/ es-git-go
are spawned in parallel by Sidekiq - one per git push
. We get very little control over priority, concurrency or IO load as a result.
By converting es-git-go
into a long-lived daemon that runs on N hosts, we can pass it notifications when a git push
happens, and make intelligent scheduling decisions to limit the total CPU, RAM and IOPS load of elasticsearch indexing.
I propose doing this for v9.2 - an initial implementation can introduce an es-git-go serve
mode, and have es-git-go <project_id> <path>
perform the enqueuing. Responsibility for updating the index_statuses
table would move from gitlab-ee to es-git-go.