Skip to content
Snippets Groups Projects

Handle missing repository storage

Closed Bob Van Landuyt :neckbeard: requested to merge bvl-handle-missing-repository-storage into master

What does this MR do?

In this MR we make sure a different error is raised when a storage directory would be missing.

Currently GitLab won't start when we try to start it when one of the configured storages isn't available, Pathname#realpath raises an error in 6_validations. I think that is behavior we want to keep.

However, when GitLab was already running, we would mask a storage disappearing with a NoRepository-exception, which would be handled in several places. The new InvalidStorage exception is not handled.

Why was this MR needed?

Currently when a storage directory goes missing (fe an NFS store goes offline), we would raise a Gitlab::Git::Repository::NoRepository which would get rescued in the ProjectsController and remove the exists cache for the repository. Now the exists cache is invalid when the NFS comes back online.

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #31237 (closed)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • @reprazent code looks good, just some typos on the rspec example cases.

  • added 1 commit

    • 650500bd - Raise InvalidStorage if the storage_path doesn't exist

    Compare with previous version

  • Bob Van Landuyt :neckbeard: resolved all discussions

    resolved all discussions

  • @brodock Thanks! Do you agree that the InvalidStorage error shouldn't be handled at the moment so it causes a crash and someone needs to take a look?

  • @reprazent I think we need @pcarranza opinions on that mater. Ideally thinking in having a fault-tolerant system we shouldn't take the whole instance down because one of the shared resources are not available. I'm not aware how it is being handled today, I'm assuming we have the same behavior. If we do, we should create a new issue to discuss how to handle this non-available state.

    A suggestion is: when you get into that state, it should flag the repository as in this state (perhaps save something in redis), and have it retry in X amount of time. Any repository that is under this unavailable shard should either display data from cache or display a temporary failure message of some sort.

  • Ideally thinking in having a fault-tolerant system we shouldn't take the whole instance down because one of the shared resources are not available.

    If this happens while the system is up, trying to access a repository on storage that is not mounted would cause a 500 error. The other repositories will keep working.

    It will however not be possible to bring up a new instance because of the validation that currently happens in an initialiser.

    Any repository that is under this unavailable shard should either display data from cache or display a temporary failure message of some sort.

    This we could do by handling the exception and rendering a a new error page, maybe a 503?

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading