The current caching setup is a bit of a mess. There are too many different expire_XXX_cache methods, too many cases where certain caches have to be flushed for either all branches or just an individual one, etc, etc. For now I'll leave this as-is but I'll put some thought in cleaning this up as I probably will have to add more caching in the coming days any way.
This should be taken care of before we add any extra caching as otherwise this rabbit hole will only get bigger and bigger.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
We're also blowing memory on the cache keys. Right now cache keys are the full namespace of a project/wiki so we end up with keys like "gitlab-org/gitlab-ce" and "gitlab-org/gitlab-ce.wiki". Instead we can just use project IDs so we'd end up with something like "13083" and "13083.wiki". Given enough cache keys this could make quite the difference in terms of memory used by Redis.
However, the first step should be to clean up the code using the existing cache key setup.
Instead I think it should be telling Repository "Hey, [this thing] happened, do something if you need to", and all of the knowledge of which cache(s) need to be cleared or updated or whatever, is contained entirely in Repository.
@rspeicher I agree, I was thinking of adding methods to Repository called something like branch_pushed or commit_pushed which then take care of Repository specific logic for those actions (e.g. clearing a cache). This should also remove some duplication from the various Sidekiq workers/services.
@sytses While we probably could do this in time for 8.5 I think it's potentially dangerous to not give it enough real-world testing time. I would recommend 8.6. cc @DouweM
@yorickpeterse I like the "hook" approach. If you don't like the names we could do something Rails-y like after_create, after_delete, after_commit, after_tag_push (it kind of starts to fall apart here).
I currently have the following list of hook/method names for the Repository class:
after_import
after_push_commit
after_create_branch
after_remove_branch
after_create_tag
before_delete
I also started looking into the various ways we currently flush caches and what kind of caches we're flushing, starting with Repository#expire_cache. This particular method ends up flushing the following caches:
size
branch_names
tag_names
commit_count
readme
version
contribution_guide
changelog
license
branch ahead/behind statistics cache
emptiness cache (if needed)
Some of these caches (e.g. the changelog or contributing guide caches) contain the full Git blobs, and they're flushed upon every push (even when not needed).
To allow for conditional flushing we'll have to pass the list of modified files (if any) to the appropriate hooks/methods. Having said that, for some cases this might be tricky. For example, if a project has a file called LICENSE and a commit replaces it with a file called COPYING, which path(s) do you pass to a method? Also just passing the paths would mean the method can't figure out which file was removed and which one was added. Another idea might be to pass the entire Git commit (whenever available/applicable) to a method, allowing the method itself to figure out what to do.