Skip to content
Snippets Groups Projects

Enable gzip in backup task

Closed Stan Hu requested to merge stanhu/gitlab-ce:add-backup-options into master

This enables gzip in the backup task. Note: this will break any scripts that depend on *.tar instead of *.tar.gz.

Closes #1730 (closed)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Maintainer

    Good point. I missed the restore case. I think I'd better add tests too.

  • Looks good (if there is a backup/restore testcase that works). Thanks for implementing that feature.

    Maybe the default should be switched to gzip=true. This would save a lot of diskspace=money for many people and make off-site backup faster, without doing any work. It might break if people assume ".tar", but any sysadmin actually writing a cp *.tar will most likely run gz before :)

  • The restore should be just be

    file_list = Dir.glob("*gitlab_backup.tar{,.gz}").each.map { |f| f.split(//).first.to_i }

    sadly that shows up in a couple of places, make sure to search for .tar again.

  • So if I understand correctly, the current behavior is:

    • create 'db', 'repositories' and 'uploads' directories (or a subset if parts of the backup are skipped)
    • chmod 700 on those three directories

    And this chmod fails on some Windows file shares (CIFS/SMB I would guess?).

    @stanhu have you considered solutions that don't need new gitlab.yml options?

    For example, even if chmod fails on CIFS with certain mount settings, maybe FileUtils.mkdir_p('db', mode: 0700) still succeeds. Or wrapping the mkdir in a umask change. (ugh)

  • Stan Hu Added 1 commit:

    Added 1 commit:

    • 510bbba7 - Allow configuration of chmod settings and support gzip in backup task
  • Stan Hu Added 1 commit:

    Added 1 commit:

    • 7d01f22c - Allow configuration of chmod settings and support gzip in backup task
  • Author Maintainer

    @jacobvosmaer I suppose we could make gzip mandatory and get rid of the option. The chmod mask and umask still seems like it should be left to the system administrator.

    I've updated the MR to work with restore and the tests.

  • @stanhu about gzip compression, I made a PR about it that was refused and closed by @jacobvosmaer: https://github.com/gitlabhq/gitlabhq/pull/7962

    Not sure if it will be accepted where. If yes, can someone explain me the difference? :-)

    Thanks.

  • @stanhu OK let's do mandatory gzip. This is something that we will need to put loud and clear in the blog post and the changelog to prevent the pain for admins mentioned in https://github.com/gitlabhq/gitlabhq/issues/9235 . (The file extension would change so scripts that look for '*.tar' will break.)

    I think the umask option in gitlab.yml makes sense.

    I am still not liking the chmod options. There has to be a simpler way.

  • :( I just noticed that this chmod thing the backup script is not doing its job properly.

    As I remember, the reason to restrict access to the db, repositories and uploads directories created during the backup is to prevent exposing the contents of the backup to every user on the system. But right now there is a race condition! The db directory, for example, first gets created with 'normal' permissions. Then we do a SQL dump, to a file that is world-readable. The chmod is coming way to late, only at the moment we are about to create a tar file. It needs to happen before we do a SQL dump.

  • @stanhu I recommend removing the chmod change from this MR so we can look at it in isolation.

  • Stan Hu Added 1 commit:

    Added 1 commit:

    • f64cc89e - Enable gzip compression in backup task
  • Author Maintainer

    @jacobvosmaer I've reverted the MR so that it only enables gzip.

    Good call about the permissions. It sounds like a combination of umask/chmod calls need to be made in the specific tasks.

  • @stanhu @jacobvosmaer I'd like to suggest to leave an option to switch gzip off. Some Deployments use block-deduplicating storage for incremental backups. There having a .tar might be nicer than a .tar.gz

  • @robinro does such deduplication actually work? What sort of space savings does it give?

    I think turning gzip on gives a modest benefit to a lot of people. It comes at an unknown increase in storage cost to an unknown number of people who have deduplicated storage for tar files.

    Adding an if/else into the backup script increases the chance of bugs for everybody. :(

    I find this a hard question.

  • @dzaporozhets what do you think about this choice, in light of the discussion above?

    • tar for everybody
    • gzip for everybody
    • a gzip true/false option
  • @jacobvosmaer I don't think a simple if is much of a bug-risk. It might be ok to ignore the (very small) number of people who have a proper reason not to use gzip.

    But there are many who built scripts to move backups to a different storage (i.e., everybody who runs gitlab in production!). They might have used *.tar in their scripts and thus this change will break their backup, which is not to be taken lightly.

    Please consider to leave a switch to turn of gzip and have a clear announcement in the changelog so all sysadmins are warned.

  • @jacobvosmaer I think gzip for everybody is a best option because:

    • its storage benefit for almost everyone. Even if deduplication works - people can add repack to their backup process.
    • no complexity of switching between backup types -> more reliable backups.
  • They might have used *.tar in their scripts and thus this change will break their backup, which is not to be taken lightly.

    @robinro I agree this is not to be taken lightly. However, we are going to turn gzip on by default, so we need to warn users anyway about the changing filenames.

    If we want to protect existing backup tooling we need to ship GitLab with gzip turned off by default, or just forget about adding gzip into the mix.

    From a developer point of view I want to say: turn on gzip, no option, and also stop rotating '.tar' backups. People should read the release blog post / changelog when they upgrade.

    From a system administration point of view, I am pretty confident that there are installations out there with poor backup monitoring, and that some of those installations have tooling in place that fails silently on *_gitlab_backup.tar.gz. Those installations would be hurt by this change. Administrators of installations with breaking tooling but good monitoring would not be hurt, only annoyed at the work they have to do.

    • mandatory gzip, forget about rotating '.tar' files: scripts break for some installations, disk space wasted without admin intervention. BENEFIT: high, COMPLEXITY: none, BREAKAGE: high
    • mandatory gzip: scripts break for some installations. BENEFIT: high, COMPLEXITY: medium, BREAKAGE: medium
    • gzip option, default on: scripts break for some, they have a quick fix by turning gzip off. BENEFIT: high, COMPLEXITY: high, BREAKAGE: medium
    • gzip option, default off: nothing breaks, only those users who turn it on (now or in the future) benefit from this change. BENEFIT: low, COMPLEXITY: high, BREAKAGE: none
    • forget about gzip, stick with tar: nothing breaks. BENEFIT: none, COMPLEXITY: none, BREAKAGE: none
  • I hope what I wrote above gives some insight on why @Soullivaneuh's PR got stuck in limbo. :(

  • Author Maintainer

    @jacobvosmaer Thanks, that makes a lot of sense. I'll leave it up to you guys to figure out what to do with this MR; it has been updated to enable gzip for all. The restore task still works with the original *.tar files.

  • @jacobvosmaer Thanks for the detailed discussion. I can now understand why you would go with mandatory gzip.

  • mentioned in issue #1790 (closed)

  • Thanks for all the work @stanhu and thanks for the discussion everybody. We came up with a better way to get smaller backup files in #1790 (closed).

    For reference, there is an alternative solution for the CIFS chmod issue emerging in #1704 (closed).

    Closing in favour of #1790 (closed).

  • username-removed-5302 Status changed to closed

    Status changed to closed

  • mentioned in issue #1704 (closed)

  • Please register or sign in to reply
    Loading