Skip to content

Start versioning cached markdown fields

Nick Thomas requested to merge (removed):30672-versioned-markdown-cache into master

What does this MR do?

Adds a number of database columns to track the version of data in cached markdown columns (*_html)

From time to time, we have to invalidate the data we've cached. The most common cause of this is a change to how the renderer works. The existing ClearDatabaseCacheWorker is expensive to run on GitLab.com - updating tens of millions of rows is a fundamentally hard problem.

By introducing a hardcoded version (CacheMarkdownField::CACHE_VERSION) and storing this with each row, we can check at read time whether the HTML needs to be re-rendered, rather than doing it all at once on code deploy. Whenever the renderer changes its behaviour, just update the version number.

So we don't need two cache columns per markdown field, there is just a single cached_markdown_version column per table. This means that changing any markdown field should regenerate every html field for a row.

Are there points in the code the reviewer needs to double check?

Is a simple incrementing integer OK? How do we signpost users to remember to change it when they alter the renderer? Should the version be stored in Banzai?

Why was this MR needed?

See https://gitlab.com/gitlab-com/infrastructure/issues/1576 https://gitlab.com/gitlab-org/gitlab-ce/issues/30672

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #30672 (closed)

Merge request reports