@stanhu I'm totally on this, as it works for me locally - but there are going to be so many differences in the environments that it's hard to say.
The basic steps squashing does are:
$ cd /path/to/repo/storage$ git worktree add $squash_path$target_branch--detach$ cd$squash_path# this should now be a worktree$ git diff --binary$start_sha...$head_sha | git apply --index$ git commit -C$head_sha
Do you have any ideas? If not, tomorrow I'm going to try with an omnibus package and see if anything's weird, or see if staging's up and I can poke around there.
Squashing uses a worktree, which means that instead of cloning the bare repo and then pushing back to that remote, it creates a working copy from the bare repo and when we commit, commits happen directly in the bare repo.
When we ran it manually on production, we noticed that there was always an error immediately afterwards in the logs: git-annex: First run: git-annex init. I might have actually seen this in the logs above, but disregarded it as irrelevant, which was a mistake.
This still didn't make much sense, though, because git annex is enabled on staging, too, and should have nothing to do with commits - only pushes. We eventually realised that it was coming from a pre-commit hook, and then it took us a little longer to notice that ... there shouldn't be a pre-commit hook!
Production has this, on all workers, last modified somewhere between May and August 2016:
#!/bin/sh# automatically configured by git-annexgit annex pre-commit .
Staging doesn't have it, and dev doesn't have it. It's also useless, because until we started using worktrees, we never committed on the server, just pushed! (As described above.)
We should move /opt/gitlab/embedded/service/gitlab-shell/hooks/pre-commit to /opt/gitlab/embedded/service/gitlab-shell/hooks/pre-commit.bak on all production workers for now, and that should have squash working - or at least, not completely broken!
@rymai thanks, that's interesting! It seems you tried at around 09:19 and 09:26 UTC. For the second one, I found this in the logs: ERROR -> Squash task canceled: Another squash is already in progress, but nothing for the first, not even a MergeWorker
@smcgivern Correct, I first tried it, then I saw that it was "doing nothing" (the merge button was greyed out and the spinner was spinning but nothing else), so I reloaded the page and tried again...
OK, so the problem with these appears to have been two things.
The pre-commit hook is added by git annex, which is annoying. We need to skip that with --no-verify. (To test this out, I ran ssh git@gitlab.com git-annex-shell configlist smcgivern/gitlab-ee.git locally, and this created the hook on worker13.
Creating worktrees for larger repos (like GitLab EE) is really slow. In a terminal, it took over two minutes to check out all the files. In a Ruby process, our git process got stuck! @ahmadsherif and I don't know why, but it seemed to stop doing anything other than printing the percentage of files checked out when we used strace.