Skip to content
Snippets Groups Projects

Backups fail occasionally with "tar: ./objects: file changed as we read it" error

Merged username-removed-117638 requested to merge 5613-backups-fail into master
All threads resolved!

Closes #5613 (moved)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • @tiagonbotelho that organization seems fine. Does the user1/fork1 directory get created automatically? We could also do something with fork1.custom_hooks.tar I suppose. But either is OK I think.

  • We should have tests in spec/tasks/gitlab/backup_rake_spec.rb that assert which extra tar files get created.

  • To answer your question, yes I think this code is doing the right things.

  • Added 1 commit:

    • e3f885ac - implement create backup specs to check whether or not .tars were created properly
  • username-removed-117638 Marked the task CHANGELOG entry added as completed

    Marked the task CHANGELOG entry added as completed

  • Marked the task Documentation created/updated as completed

  • username-removed-117638 Marked the task API support added as completed

    Marked the task API support added as completed

  • username-removed-117638 Marked the task Added for this feature/bug as completed

    Marked the task Added for this feature/bug as completed

  • username-removed-117638 Marked the task All builds are passing as completed

    Marked the task All builds are passing as completed

  • username-removed-117638 Marked the task All builds are passing as incomplete

    Marked the task All builds are passing as incomplete

  • username-removed-117638 Marked the task Branch has no merge conflicts with master (if you do - rebase it please) as completed

    Marked the task Branch has no merge conflicts with master (if you do - rebase it please) as completed

  • Added ~149423 label

  • Added 500 commits:

    • e3f885ac...31570c12 - 497 commits from branch master
    • 825a823a - implements tar for annex and custom_hooks and bundle for git data
    • 48fbe593 - implements tar for annex and custom_hooks and bundle for git data
    • 371bd985 - implement create backup specs to check whether or not .tars were created properly
  • username-removed-117638 Unmarked this merge request as a Work In Progress

    Unmarked this merge request as a Work In Progress

  • Added 1 commit:

    • c8461e66 - implement create backup specs to check whether or not .tars were created properly
  • Added 1 commit:

    • 9e6c37e9 - implement create backup specs to check whether or not .tars were created properly
  • Added 217 commits:

    • 9e6c37e9...12fe6a6f - 214 commits from branch master
    • 8bf61c89 - implements tar for annex and custom_hooks and bundle for git data
    • 4d650625 - implements tar for annex and custom_hooks and bundle for git data
    • 4c1b8962 - implement create backup specs to check whether or not .tars were created properly
  • Milestone changed to %8.12

  • username-removed-117638 Removed ~149423 label

    Removed ~149423 label

  • Added 201 commits:

    • 4c1b8962...fb84439a - 198 commits from branch master
    • 54ea876c - implements tar for annex and custom_hooks and bundle for git data
    • 9147b465 - implements tar for annex and custom_hooks and bundle for git data
    • 86865de8 - implement create backup specs to check whether or not .tars were created properly
  • Added 35 commits:

    • 86865de8...f41098ad - 32 commits from branch master
    • 2a6f2235 - implements tar for annex and custom_hooks and bundle for git data
    • 37a228d1 - implements tar for annex and custom_hooks and bundle for git data
    • 8c184f0e - implement create backup specs to check whether or not .tars were created properly
  • Added 81 commits:

    • 8c184f0e...b2bf01f4 - 78 commits from branch master
    • 4aa0c6a8 - implements tar for annex and custom_hooks and bundle for git data
    • 74038a05 - implements tar for annex and custom_hooks and bundle for git data
    • a13c3c61 - implement create backup specs to check whether or not .tars were created properly
  • @ayufan any news on the failing build? really wanted to close this one :)

  • Added 176 commits:

    • a13c3c61...0357df0c - 173 commits from branch master
    • 36443ad6 - implements tar for annex and custom_hooks and bundle for git data
    • 2ca41eec - implements tar for annex and custom_hooks and bundle for git data
    • 5505b7b1 - implement create backup specs to check whether or not .tars were created properly
  • Added 1 commit:

    • 8e376b5e - Remove debugger command from backup code
  • Added 1 commit:

    • 8e376b5e - Remove debugger command from backup code
  • username-removed-117638 Resolved all discussions

    Resolved all discussions

  • Added 1010 commits:

    • 8e376b5e...1e7ea64e - 1006 commits from branch master
    • ee01028d - implements tar for annex and custom_hooks and bundle for git data
    • c8b94801 - implements tar for annex and custom_hooks and bundle for git data
    • 75f84ae1 - implement create backup specs to check whether or not .tars were created properly
    • dadb61d7 - Remove debugger command from backup code
  • Added 1 commit:

    • 0413da1e - Remove debugger command from backup code
  • Added 1 commit:

    • 3fb4a2a7 - Remove debugger command from backup code
  • Added 1 commit:

    • 5dc6369e - Remove debugger command from backup code
  • Added 1 commit:

  • Added 1 commit:

    • de05cedc - tests whether it blocks or not
  • Mentioned in issue #5613 (moved)

  • @jacobvosmaer-gitlab @tiagonbotelho Can you please explain how this addresses the issue with files changing as we read them? This looks like it's still reading in from the actual data location which wouldn't prevent that scenario, would it?

  • @dblessing what we are basically doing is resolving it by using git bundle on Git data and after that using tar for custom_hooks and annex folders. which was suggested by @jacobvosmaer-gitlab in !5613 (merged)

    right now the implementation is working (by that I mean that it creates the backup and the restore without failing as it did previously) but I am having serious issues writing the test for it since something is blocking the database from running the test and I cannot seem to find a solution for this.

    I already tried the following:

    . creating two traits that create the respective folders (annex or custom_hooks) along with the project creation . removing these traits and creating the folders in a before do call

    none of them worked and I am running out of ideas :/

    Edited by username-removed-117638
  • Mentioned in issue #22673 (moved)

  • What happens if someone pushes an Annex file update while your tarring? I think it will lead to the same error. This is also an error with things like artifacts currently

  • @dblessing is that a problem? If somebody pushes a file in the middle of a backup, perhaps it is not included in the backup. But then it will be in the next backup. Or will tar complain and abort?

  • as @stanhu stated in the issue "Most people won't be using the annex/ dir, and the amount of data in custom_hooks is small."

  • Maybe can use git-annex itself to move data git-annex data.

  • @jacobvosmaer-gitlab so we would have git bundle for git data, git-annex for git-annex and custom_hooks with tar? isn't it better the way it is? just wondering

  • @dblessing do you know if we have received reports about the backup script choking because of changes in the annex directory?

  • @dblessing to answer your question how this improves things when files change during the backup: we are hoping that git bundle is not confused as easily as tar because it understands the structure of the Git repository.

    LFS files never change, they only get added.

    Custom hooks: these can only change when an administrator edits them on the GitLab server. I think we can think of that as 'almost never changes'.

    Git-annex: this is where I suggested maybe we can use git-annex because it 'understands' the organization of the directory. I don't know enough about git-annex internals to know what can go wrong / what can change on disk during the backup.

  • @jacobvosmaer-gitlab Tar will complain and abort by default when a file changes as it reads it. It doesn't matter if it's existing or new. Let's say it's a new annex file being pushed as tar reads it in - it's going to abort the whole backup. We haven't specifically seen this with Annex, but since it deals with large files it can happen and since we support Annex we have to assume some are using it. Also, the solution to the problem is relevant to other parts of the backup, like Artifacts as I mentioned. We should be consistent in the way we backup files that could change.

    I agree that custom_hooks probably isn't a concern.

  • I'll look into using git-annex to make the backup for it then! thank you for the help everyone 😃

  • @dblessing files should not change while we read them, preferably.

    For instance with LFS we create a tempfile first and then we atomically rename it once the upload has finished. That means tar either sees it or it doesn't. Git does similar things, with refs and the packed-refs file being an exception. There it uses locking. git bundle should be aware of the locking.

    If CI artifact files ever change we should see if we can change that. I don't know what git-annex does on the inside.

    I don't think we can afford to stop the world, or even freeze a project, while backups are in progress. It would be complex (although we have the machinery now that we can move repositories between 'storages') and annoying for users.

  • @grzesiek do you know if artifact zips are ever changed after creation? Do we use tempfile+rename when accepting them?

  • @jacobvosmaer-gitlab The temp file then rename seems like a good solution. I know that is not how it is currently with artifacts because I've been troubleshooting backups where artifacts caused the backup to fail. Do we know that annex, user uploads, and other data locations do the tempfile rename? I agree we can't block writes while backups are completing. We either need to make a copy of a given dataset to a temp location, then backup, or we need the tempfile rename behavior you describe.

    If we can answer the question about how annex does things it would be great to know. But it probably doesn't need to hold up this process as it seems like an improvement either way.

  • @jacobvosmaer-gitlab ZIP archives with artifacts do not change after creation (at least should not). On the rails side we do use CarrierWave to accept artifacts.

  • Thanks @grzesiek

    know that is not how it is currently with artifacts because I've been troubleshooting backups where artifacts caused the backup to fail.

    I wonder if that is because the tarball includes the temp directory for the artifact uploads. Maybe we can exclude the /tmp and /cache directories.

    We either need to make a copy of a given dataset to a temp location, then backup

    How does that protect us against changes while copying?

  • @tiagonbotelho I think going from 'tar' to 'git bundle' in itself is a worthwhile improvement. @dblessing do you agree that maybe we should just start with that and make further improvements in a later MR? This one has been open for quite a long time already.

  • Yes, please go ahead. Thanks @tiagonbotelho and @jacobvosmaer-gitlab

  • Added 1245 commits:

    • de05cedc...516d869e - 1239 commits from branch master
    • 2df22e2c - implements tar for annex and custom_hooks and bundle for git data
    • 43b5fd40 - implements tar for annex and custom_hooks and bundle for git data
    • a59303e9 - implement create backup specs to check whether or not .tars were created properly
    • 5db3c4cf - Remove debugger command from backup code
    • 5a8cae51 - tests whether it blocks or not
    • 18b7a266 - Fix tests to get the passed on mysql database
  • Added 1 commit:

    • ce176130 - Fix tests to get the passed on mysql database
  • username-removed-117638 Marked the task All builds are passing as completed

    Marked the task All builds are passing as completed

  • @tiagonbotelho LGTM 👍 Feel free to pass to an endboss!

  • username-removed-117638 Marked the task Conform by the style guides as completed

    Marked the task Conform by the style guides as completed

  • Looks good. I am not an endboss. :)

  • Glad to see the mysql problem is solved @tiagonbotelho

  • Reassigned to @DouweM

  • @tiagonbotelho Please prefer single quotes to double quotes when interpolation isn't required. https://gitlab.com/gitlab-org/gitlab-ce/blob/master/CONTRIBUTING.md#style-guides

    Edited by Drew Blessing
  • Milestone changed to %8.13

  • @tiagonbotelho Any update on this? I'd like this to ship in 8.13. Issues with backup are important.

  • Added 1 commit:

    • 70cc71e1 - fixes codestyle issues for MR acceptance

    Compare with previous version

  • Added 142 commits:

    • 2eec8773...774548be - 130 commits from branch master
    • f5b14c59 - implements tar for annex and custom_hooks and bundle for git data
    • 2c339f03 - implements tar for annex and custom_hooks and bundle for git data
    • 818ec1a6 - implement create backup specs to check whether or not .tars were created properly
    • 6b85bfe2 - Remove debugger command from backup code
    • 99f28193 - tests whether it blocks or not
    • 7a318ba3 - implements tar for annex and custom_hooks and bundle for git data
    • 616d7a2e - implements tar for annex and custom_hooks and bundle for git data
    • 1d43192c - implement create backup specs to check whether or not .tars were created properly
    • 35249390 - Remove debugger command from backup code
    • bb930eda - tests whether it blocks or not
    • b8c3192f - fixes codestyle issues for MR acceptance
    • e2bf227a - improves test suite to be more generic and tests the restoration of the backups

    Compare with previous version

  • Added 1090 commits:

    Compare with previous version

  • Added 1 commit:

    • 17e49656 - improves test suite to be more generic and tests the restoration of the backups

    Compare with previous version

  • Added 1 commit:

    • 75f1269a - Backups do not fail anymore when using tar on annex and custom_hooks

    Compare with previous version

  • username-removed-117638 Resolved all discussions

    Resolved all discussions

  • Added 20 commits:

    Compare with previous version

  • Milestone changed to %8.14

  • Reassigned to @DouweM

  • Douwe Maan
  • Added 1 commit:

    • 206a0232 - Backups do not fail anymore when using tar on annex and custom_hooks

    Compare with previous version

  • username-removed-117638 Resolved all discussions

    Resolved all discussions

  • Reassigned to @DouweM

  • Added 1 commit:

    • 96f050fa - Backups do not fail anymore when using tar on annex and custom_hooks

    Compare with previous version

  • Douwe Maan Enabled an automatic merge when the build for 96f050fa succeeds

    Enabled an automatic merge when the build for 96f050fa succeeds

  • Douwe Maan Status changed to merged

    Status changed to merged

  • Douwe Maan Mentioned in commit 4d6de50c

    Mentioned in commit 4d6de50c

  • mentioned in issue gitlab#6396

  • Please register or sign in to reply
    Loading