Backups fail occasionally with "tar: ./objects: file changed as we read it" error
-
CHANGELOG entry added -
Documentation created/updated -
API support added - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Conform by the style guides -
Branch has no merge conflicts with master
(if you do - rebase it please) -
Squashed related commits together
Closes #5613 (moved)
Merge request reports
Activity
Added 168 commits:
-
1a94e856...5a4ecb98 - 166 commits from branch
master
- 84366e40 - implements tar for annex and custom_hooks and bundle for git data
- 3ee3d0f0 - implements basic resoting functionality to the backups
-
1a94e856...5a4ecb98 - 166 commits from branch
Added 1 commit:
- ba498460 - implements basic resoting functionality to the backups
Added 1 commit:
- 50c09dc8 - implements basic resoting functionality to the backups
\cc @jacobvosmaer-gitlab can you check if this is the right implementation for this problem? I am storing inside the user's backup a folder with the project_name to store the
annex.tar
andcustom_hooks.tar
and outside the tars folder I keep project_name.bundleExample: when running
rake gitlab:backup:create
on user1 we get:tmp/backups/repositories/user1/fork1/annex.tar
tmp/backups/repositories/user1/fork1/custom_hooks.tar
tmp/backups/repositories/user1/fork1.bundle
Edited by username-removed-117638Added 1 commit:
- f451d623 - implements basic resoting functionality to the backups
- Resolved by username-removed-117638
@tiagonbotelho that organization seems fine. Does the user1/fork1 directory get created automatically? We could also do something with
fork1.custom_hooks.tar
I suppose. But either is OK I think.Added 1 commit:
- e3f885ac - implement create backup specs to check whether or not .tars were created properly
Marked the task CHANGELOG entry added as completed
Marked the task Documentation created/updated as completed
Added 500 commits:
-
e3f885ac...31570c12 - 497 commits from branch
master
- 825a823a - implements tar for annex and custom_hooks and bundle for git data
- 48fbe593 - implements tar for annex and custom_hooks and bundle for git data
- 371bd985 - implement create backup specs to check whether or not .tars were created properly
Toggle commit list-
e3f885ac...31570c12 - 497 commits from branch
Added 1 commit:
- c8461e66 - implement create backup specs to check whether or not .tars were created properly
Added 1 commit:
- 9e6c37e9 - implement create backup specs to check whether or not .tars were created properly
Added 217 commits:
-
9e6c37e9...12fe6a6f - 214 commits from branch
master
- 8bf61c89 - implements tar for annex and custom_hooks and bundle for git data
- 4d650625 - implements tar for annex and custom_hooks and bundle for git data
- 4c1b8962 - implement create backup specs to check whether or not .tars were created properly
Toggle commit list-
9e6c37e9...12fe6a6f - 214 commits from branch
Milestone changed to %8.12
Added 201 commits:
-
4c1b8962...fb84439a - 198 commits from branch
master
- 54ea876c - implements tar for annex and custom_hooks and bundle for git data
- 9147b465 - implements tar for annex and custom_hooks and bundle for git data
- 86865de8 - implement create backup specs to check whether or not .tars were created properly
Toggle commit list-
4c1b8962...fb84439a - 198 commits from branch
mentioned in issue #21187 (closed)
Added 35 commits:
-
86865de8...f41098ad - 32 commits from branch
master
- 2a6f2235 - implements tar for annex and custom_hooks and bundle for git data
- 37a228d1 - implements tar for annex and custom_hooks and bundle for git data
- 8c184f0e - implement create backup specs to check whether or not .tars were created properly
Toggle commit list-
86865de8...f41098ad - 32 commits from branch
Added 81 commits:
-
8c184f0e...b2bf01f4 - 78 commits from branch
master
- 4aa0c6a8 - implements tar for annex and custom_hooks and bundle for git data
- 74038a05 - implements tar for annex and custom_hooks and bundle for git data
- a13c3c61 - implement create backup specs to check whether or not .tars were created properly
Toggle commit list-
8c184f0e...b2bf01f4 - 78 commits from branch
@ayufan any news on the failing build? really wanted to close this one :)
Added 176 commits:
-
a13c3c61...0357df0c - 173 commits from branch
master
- 36443ad6 - implements tar for annex and custom_hooks and bundle for git data
- 2ca41eec - implements tar for annex and custom_hooks and bundle for git data
- 5505b7b1 - implement create backup specs to check whether or not .tars were created properly
Toggle commit list-
a13c3c61...0357df0c - 173 commits from branch
Added 1 commit:
- 8e376b5e - Remove debugger command from backup code
Added 1 commit:
- 8e376b5e - Remove debugger command from backup code
Added 1010 commits:
-
8e376b5e...1e7ea64e - 1006 commits from branch
master
- ee01028d - implements tar for annex and custom_hooks and bundle for git data
- c8b94801 - implements tar for annex and custom_hooks and bundle for git data
- 75f84ae1 - implement create backup specs to check whether or not .tars were created properly
- dadb61d7 - Remove debugger command from backup code
Toggle commit list-
8e376b5e...1e7ea64e - 1006 commits from branch
Added 1 commit:
- 0413da1e - Remove debugger command from backup code
Added 1 commit:
- 3fb4a2a7 - Remove debugger command from backup code
Added 1 commit:
- 5dc6369e - Remove debugger command from backup code
Added 1 commit:
- ef7798b3 - checks if tests pass now
Added 1 commit:
- de05cedc - tests whether it blocks or not
Mentioned in issue #5613 (moved)
@jacobvosmaer-gitlab @tiagonbotelho Can you please explain how this addresses the issue with files changing as we read them? This looks like it's still reading in from the actual data location which wouldn't prevent that scenario, would it?
@dblessing what we are basically doing is resolving it by using
git bundle
on Git data and after that usingtar
forcustom_hooks
andannex
folders. which was suggested by @jacobvosmaer-gitlab in !5613 (merged)right now the implementation is working (by that I mean that it creates the backup and the restore without failing as it did previously) but I am having serious issues writing the test for it since something is blocking the database from running the test and I cannot seem to find a solution for this.
I already tried the following:
. creating two traits that create the respective folders (annex or custom_hooks) along with the project creation . removing these traits and creating the folders in a
before do
callnone of them worked and I am running out of ideas :/
Edited by username-removed-117638Mentioned in issue #22673 (moved)
@dblessing is that a problem? If somebody pushes a file in the middle of a backup, perhaps it is not included in the backup. But then it will be in the next backup. Or will tar complain and abort?
as @stanhu stated in the issue "Most people won't be using the annex/ dir, and the amount of data in custom_hooks is small."
@jacobvosmaer-gitlab so we would have git bundle for git data, git-annex for git-annex and custom_hooks with tar? isn't it better the way it is? just wondering
@dblessing do you know if we have received reports about the backup script choking because of changes in the annex directory?
@dblessing to answer your question how this improves things when files change during the backup: we are hoping that
git bundle
is not confused as easily astar
because it understands the structure of the Git repository.LFS files never change, they only get added.
Custom hooks: these can only change when an administrator edits them on the GitLab server. I think we can think of that as 'almost never changes'.
Git-annex: this is where I suggested maybe we can use git-annex because it 'understands' the organization of the directory. I don't know enough about git-annex internals to know what can go wrong / what can change on disk during the backup.
@jacobvosmaer-gitlab Tar will complain and abort by default when a file changes as it reads it. It doesn't matter if it's existing or new. Let's say it's a new annex file being pushed as tar reads it in - it's going to abort the whole backup. We haven't specifically seen this with Annex, but since it deals with large files it can happen and since we support Annex we have to assume some are using it. Also, the solution to the problem is relevant to other parts of the backup, like Artifacts as I mentioned. We should be consistent in the way we backup files that could change.
I agree that custom_hooks probably isn't a concern.
@dblessing files should not change while we read them, preferably.
For instance with LFS we create a tempfile first and then we atomically rename it once the upload has finished. That means tar either sees it or it doesn't. Git does similar things, with refs and the packed-refs file being an exception. There it uses locking.
git bundle
should be aware of the locking.If CI artifact files ever change we should see if we can change that. I don't know what git-annex does on the inside.
I don't think we can afford to stop the world, or even freeze a project, while backups are in progress. It would be complex (although we have the machinery now that we can move repositories between 'storages') and annoying for users.
@grzesiek do you know if artifact zips are ever changed after creation? Do we use tempfile+rename when accepting them?
@jacobvosmaer-gitlab The temp file then rename seems like a good solution. I know that is not how it is currently with artifacts because I've been troubleshooting backups where artifacts caused the backup to fail. Do we know that annex, user uploads, and other data locations do the tempfile rename? I agree we can't block writes while backups are completing. We either need to make a copy of a given dataset to a temp location, then backup, or we need the tempfile rename behavior you describe.
If we can answer the question about how annex does things it would be great to know. But it probably doesn't need to hold up this process as it seems like an improvement either way.
@jacobvosmaer-gitlab ZIP archives with artifacts do not change after creation (at least should not). On the rails side we do use CarrierWave to accept artifacts.
Thanks @grzesiek
know that is not how it is currently with artifacts because I've been troubleshooting backups where artifacts caused the backup to fail.
I wonder if that is because the tarball includes the temp directory for the artifact uploads. Maybe we can exclude the /tmp and /cache directories.
We either need to make a copy of a given dataset to a temp location, then backup
How does that protect us against changes while copying?
@tiagonbotelho I think going from 'tar' to 'git bundle' in itself is a worthwhile improvement. @dblessing do you agree that maybe we should just start with that and make further improvements in a later MR? This one has been open for quite a long time already.
Yes, please go ahead. Thanks @tiagonbotelho and @jacobvosmaer-gitlab
Added 1245 commits:
-
de05cedc...516d869e - 1239 commits from branch
master
- 2df22e2c - implements tar for annex and custom_hooks and bundle for git data
- 43b5fd40 - implements tar for annex and custom_hooks and bundle for git data
- a59303e9 - implement create backup specs to check whether or not .tars were created properly
- 5db3c4cf - Remove debugger command from backup code
- 5a8cae51 - tests whether it blocks or not
- 18b7a266 - Fix tests to get the passed on mysql database
Toggle commit list-
de05cedc...516d869e - 1239 commits from branch
Added 1 commit:
- ce176130 - Fix tests to get the passed on mysql database
Reassigned to @dbalexandre
@tiagonbotelho LGTM
👍 Feel free to pass to an endboss!Reassigned to @tiagonbotelho
Reassigned to @jacobvosmaer-gitlab
Marked the task Conform by the style guides as completed
Reassigned to @tiagonbotelho
Glad to see the mysql problem is solved @tiagonbotelho
Reassigned to @DouweM
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
Reassigned to @tiagonbotelho
@tiagonbotelho Please prefer single quotes to double quotes when interpolation isn't required. https://gitlab.com/gitlab-org/gitlab-ce/blob/master/CONTRIBUTING.md#style-guides
Edited by Drew BlessingMilestone changed to %8.13
@tiagonbotelho Any update on this? I'd like this to ship in 8.13. Issues with backup are important.
Added 142 commits:
-
2eec8773...774548be - 130 commits from branch
master
- f5b14c59 - implements tar for annex and custom_hooks and bundle for git data
- 2c339f03 - implements tar for annex and custom_hooks and bundle for git data
- 818ec1a6 - implement create backup specs to check whether or not .tars were created properly
- 6b85bfe2 - Remove debugger command from backup code
- 99f28193 - tests whether it blocks or not
- 7a318ba3 - implements tar for annex and custom_hooks and bundle for git data
- 616d7a2e - implements tar for annex and custom_hooks and bundle for git data
- 1d43192c - implement create backup specs to check whether or not .tars were created properly
- 35249390 - Remove debugger command from backup code
- bb930eda - tests whether it blocks or not
- b8c3192f - fixes codestyle issues for MR acceptance
- e2bf227a - improves test suite to be more generic and tests the restoration of the backups
Toggle commit list-
2eec8773...774548be - 130 commits from branch
Added 1090 commits:
-
70cc71e1...2eec8773 - 1090 commits from branch
master
-
70cc71e1...2eec8773 - 1090 commits from branch
Added 1 commit:
- 17e49656 - improves test suite to be more generic and tests the restoration of the backups
Added 1 commit:
- 75f1269a - Backups do not fail anymore when using tar on annex and custom_hooks
Marked the task Squashed related commits together as completed
Added 20 commits:
-
75f1269a...198ae21e - 19 commits from branch
master
- 9f3b790e - Backups do not fail anymore when using tar on annex and custom_hooks
-
75f1269a...198ae21e - 19 commits from branch
Milestone changed to %8.14
Reassigned to @DouweM
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
- Resolved by username-removed-117638
Reassigned to @tiagonbotelho
Added 1 commit:
- 206a0232 - Backups do not fail anymore when using tar on annex and custom_hooks
Reassigned to @DouweM
Added 1 commit:
- 96f050fa - Backups do not fail anymore when using tar on annex and custom_hooks
Enabled an automatic merge when the build for 96f050fa succeeds
Mentioned in commit 4d6de50c
mentioned in issue gitlab#6396