Clean up Repository caching madness

mentioned in merge request !2838 (merged)

We're also blowing memory on the cache keys. Right now cache keys are the full namespace of a project/wiki so we end up with keys like "gitlab-org/gitlab-ce" and "gitlab-org/gitlab-ce.wiki". Instead we can just use project IDs so we'd end up with something like "13083" and "13083.wiki". Given enough cache keys this could make quite the difference in terms of memory used by Redis.

However, the first step should be to clean up the code using the existing cache key setup.

I think we're doing things a bit backwards by, for example, having RepositoryForkWorker telling Repository to clear a specific cache.

Instead I think it should be telling Repository "Hey, [this thing] happened, do something if you need to", and all of the knowledge of which cache(s) need to be cleared or updated or whatever, is contained entirely in Repository.

Milestone changed to 8.5

@rspeicher I agree, I was thinking of adding methods to Repository called something like branch_pushed or commit_pushed which then take care of Repository specific logic for those actions (e.g. clearing a cache). This should also remove some duplication from the various Sidekiq workers/services.

Some of the "hooks" that I can think of from the top of my head:

Repository#created
Repository#deleted
Repository#commit_pushed
Repository#tag_pushed
Repository#tag_deleted
Repository#branch_pushed
Repository#branch_deleted

The names are all examples, I just can't think of any better ones at this point.

@sytses While we probably could do this in time for 8.5 I think it's potentially dangerous to not give it enough real-world testing time. I would recommend 8.6. cc @DouweM

@yorickpeterse I like the "hook" approach. If you don't like the names we could do something Rails-y like after_create, after_delete, after_commit, after_tag_push (it kind of starts to fall apart here).

I agree 8.5 is a bit too close, I can't get this resolved in 2-3 days. I'll re-assign to 8.6.

Naming wise I think the after_XXX approach is a good start.

Milestone changed to 8.6

Also see https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2838#note_3772314 "we have flush_caches and expire_cache and expire_all_caches!."

Added repository label

I currently have the following list of hook/method names for the Repository class:

after_import
after_push_commit
after_create_branch
after_remove_branch
after_create_tag
before_delete

I also started looking into the various ways we currently flush caches and what kind of caches we're flushing, starting with Repository#expire_cache. This particular method ends up flushing the following caches:

size
branch_names
tag_names
commit_count
readme
version
contribution_guide
changelog
license
branch ahead/behind statistics cache
emptiness cache (if needed)

Some of these caches (e.g. the changelog or contributing guide caches) contain the full Git blobs, and they're flushed upon every push (even when not needed).

To allow for conditional flushing we'll have to pass the list of modified files (if any) to the appropriate hooks/methods. Having said that, for some cases this might be tricky. For example, if a project has a file called LICENSE and a commit replaces it with a file called COPYING, which path(s) do you pass to a method? Also just passing the paths would mean the method can't figure out which file was removed and which one was added. Another idea might be to pass the entire Git commit (whenever available/applicable) to a method, allowing the method itself to figure out what to do.

mentioned in merge request !2936 (merged)

Work in progress merge request (still in the early stages): https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/2936

mentioned in commit 2c6e34bc

Closing this as !2936 (merged) has been merged.

Status changed to closed

Mentioned in commit pfjason/gitlab-ce@2c6e34bc

Clean up Repository caching madness

Designs

Child items ...

Activity

Admin message

Admin message

Clean up Repository caching madness

Activity