Geo: `Gitlab::Git::Repository::NoRepository: no repository for such path` error
Summary
Repositories can get, after an unknown sequential actions, into an inconsistent state that prevents the backfill worker to heal itself and make the replication stuck after it.
Steps to reproduce
While the original conditions are unknown, if you get a repository that has already started replicating (it has to have the cached state for non-emptiness, like project.repo_exists? => true
), but have no repository.
To get into this state all you have to do is remove the repository folder after it has cached project.repo_exists? => true
in a secondary node.
What is the current bug behavior?
To simulate a Backfill execution into that repository, get it's id and inform to the following gitlab-rails console line:
Geo::RepositoryBackfillService.new(p.id).execute
(See relevant logs below)
What is the expected correct behavior?
We should have the path logged in as well so we have the opportunity to manually fix it (or know we have to clear the cache) When this is detected we should clear the emptiness cache to help the backfill heal itself on next execution. We should also move to the next and not get stuck into this project, so we can at least fill in the projects which are in good state.
Relevant logs
Geo::RepositoryBackfillService: Trying to obtain lease to sync repository for project gitlab/gitlabhq (4)
Geo::RepositoryBackfillService: Started repository sync for project gitlab/gitlabhq (4)
Geo::RepositoryBackfillService: Fetching project repository for project gitlab/gitlabhq (4)
Gitlab::Git::Repository::NoRepository: no repository for such path
from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:67:in `rescue in rugged'
from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:65:in `rugged'
from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/git/repository.rb:860:in `remote_add'
from /opt/gitlab/embedded/service/gitlab-rails/app/models/repository.rb:1083:in `add_remote'
from /opt/gitlab/embedded/service/gitlab-rails/app/models/repository.rb:979:in `fetch_geo_mirror'
from /opt/gitlab/embedded/service/gitlab-rails/app/services/geo/repository_backfill_service.rb:49:in `fetch_project_repository'
from /opt/gitlab/embedded/service/gitlab-rails/app/services/geo/repository_backfill_service.rb:34:in `fetch_repositories'
from /opt/gitlab/embedded/service/gitlab-rails/app/services/geo/repository_backfill_service.rb:15:in `block in execute'
from /opt/gitlab/embedded/service/gitlab-rails/app/services/geo/repository_backfill_service.rb:74:in `try_obtain_lease'
from /opt/gitlab/embedded/service/gitlab-rails/app/services/geo/repository_backfill_service.rb:13:in `execute'
Customer being affected: https://gitlab.zendesk.com/agent/tickets/77773
Possible fixes
Handle this different exception correctly, add additional information to the log file and make sure we continue to process the remaining projects in the batch.