The only way to optimize code like this is to either use a faster storage mechanism, or run it directly on the physical disk(s) storing the Git repository. The latter in turn requires some kind of API/service to run on said servers that takes a Git action to run (e.g. via HTTP) and spits out the result.
Rugged::Diff#each and Rugged::Diff#each_patch can be quite slow (up to 30 seconds).
I think this is case where Gitaly will come into play, unless we can somehow optimize libgit2 or NFS. In looking at the strace logs, I believe it just takes a long time to seek/mmap big files over a networked file system, just as we saw in https://gitlab.com/gitlab-org/gitlab-ee/issues/1811#note_26109097. If you look at the strace logs, you see large blocks of time taken like this:
libgit2 is trying to memory-map large pack files over NFS, which causes a lot of round-trips and network transfer. The more files there are in the diff, the more times we have to do this dance.
@stanhu thanks. I can just about tell from the screenshot that the problem appears to be in getting the changes count (?), as we load the changes tab async on the new MR page. I agree with everything in your previous comment, but is there anything here we think was caused by 9.0 specifically?
Error 502 when creating a merge request through the UI or API. Some of these are on the gitlab-ce project.
For the UI I've had 502s for #new, when changing source branch and might also have had it for create. For the API it would have been the equivalent of #create.
This may or may not correlate with high load. A few times there have been questions about 502s for merge requests a couple of mins before a more general outage.
@eReGeBe we tend to refer to migrations by their GitLab::Git migration site. Do you know where Rugged::Diff#find_similar! is being called from in our codebase?
@andrewnGitlab::Git::Repository#diff_patches. I don't see it on the migration board, so I'm guessing we haven't include it. It looks easy to do with gitaly-ruby.
@eReGeBe it needs to be migrated and with gitaly-ruby it is now possible. But the way the code is structured makes it hard as it is. Too many lazy interactions.
Gitlab::Git::Repository#diff_patches. I don't see it on the migration board, so I'm guessing we haven't include it. It looks easy to do with gitaly-ruby.