An error occurred while fetching the assigned iteration of the selected issue.

Prepare automation to replace an NFS mountpoint with another drive for when one storage goes down

As a continuation of: https://gitlab.com/gitlab-com/infrastructure/issues/1961

instead of waiting for it to recover with the service being down we aggressively unmount the partition and mount a different NFS share there to accept writes.

And adding no top @andrewn idea of using a tmpfs partition to prevent reads from blocking there taking GitLab.com down from https://gitlab.com/gitlab-org/gitlab-ce/issues/33220#note_31853500

Once we know how the application would behave when we replace the drives on the fly like this, and particularly how can we recover from the replacement without data loss (my main concern with using tmpfs), we should have an automated process where we can replace any NFS mountpoint with a single script execution, and bounce all the front end unicorn and workhorse workers so they recover.

This is pending https://gitlab.com/gitlab-org/gitlab-ce/issues/33117 as we will need to have a way to avoid poisoning the cache too for when we recover.

Designs

An error occurred while loading designs. Please try again.

Child items 0

GraphQL error: The resource that you are attempting to access does not exist or you don't have permission to perform this action

No child items are currently open.

Admin message

Admin message

Prepare automation to replace an NFS mountpoint with another drive for when one storage goes down

Activity