Start versioning cached markdown fields
What does this MR do?
Adds a number of database columns to track the version of data in cached markdown columns (*_html
)
From time to time, we have to invalidate the data we've cached. The most common cause of this is a change to how the renderer works. The existing ClearDatabaseCacheWorker
is expensive to run on GitLab.com - updating tens of millions of rows is a fundamentally hard problem.
By introducing a hardcoded version (CacheMarkdownField::CACHE_VERSION
) and storing this with each row, we can check at read time whether the HTML needs to be re-rendered, rather than doing it all at once on code deploy. Whenever the renderer changes its behaviour, just update the version number.
So we don't need two cache columns per markdown field, there is just a single cached_markdown_version
column per table. This means that changing any markdown field should regenerate every html field for a row.
Are there points in the code the reviewer needs to double check?
Is a simple incrementing integer OK? How do we signpost users to remember to change it when they alter the renderer? Should the version be stored in Banzai?
Why was this MR needed?
See https://gitlab.com/gitlab-com/infrastructure/issues/1576 https://gitlab.com/gitlab-org/gitlab-ce/issues/30672
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Branch has no merge conflicts with master
(if it does - rebase it please) -
Squashed related commits together
What are the relevant issue numbers?
Closes #30672 (closed)