The source project of this merge request has been removed.
Optimise ClearDatabaseCacheWorker
What does this MR do?
Reduces the amount of work performed by ClearDatabaseCacheWorker
We move from loops like this:
SELECT id... LIMIT 1000
UPDATE ... SET x=nil, y=nil WHERE id IN (...)
To this:
UPDATE ... SET x = NULL, y = NULL WHERE id IN ( SELECT id... WHERE x IS NOT NULL AND y IS NOT NULL LIMIT 1000)
This has two main effects:
- Using the subselect means we don't have to pass a thousand IDs out of SQL and back in again
- Including the IS NOT NULL clauses means that the loop continues from where it left off in the event of a retry
Are there points in the code the reviewer needs to double check?
Why was this MR needed?
The existing logic causes problems on GitLab.com
Screenshots (if relevant)
Here's the output of running the worker with a batch size of 100 and ~900 notes in the database ready to be cleared:
[29] pry(main)> ClearDatabaseCacheWorker.new.perform
Clearing Markdown cache for AbuseReport: ["message_html"]
(0.2ms) SELECT "abuse_reports"."id" FROM "abuse_reports" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: AbuseReport (0.4ms) UPDATE "abuse_reports" SET "message_html" = NULL WHERE "abuse_reports"."id" IN (SELECT id FROM "abuse_reports" WHERE "abuse_reports"."id" <= 0 AND "abuse_reports"."message_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Appearance: ["description_html"]
(0.2ms) SELECT "appearances"."id" FROM "appearances" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Appearance (0.4ms) UPDATE "appearances" SET "description_html" = NULL WHERE "appearances"."id" IN (SELECT id FROM "appearances" WHERE "appearances"."id" <= 0 AND "appearances"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for ApplicationSetting: ["sign_in_text_html", "help_page_text_html", "shared_runners_text_html", "after_sign_up_text_html"]
(0.4ms) SELECT "application_settings"."id" FROM "application_settings" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: ApplicationSetting (0.6ms) UPDATE "application_settings" SET "sign_in_text_html" = NULL, "help_page_text_html" = NULL, "shared_runners_text_html" = NULL, "after_sign_up_text_html" = NULL WHERE "application_settings"."id" IN (SELECT id FROM "application_settings" WHERE "application_settings"."id" <= 1 AND "application_settings"."sign_in_text_html" IS NOT NULL AND "application_settings"."help_page_text_html" IS NOT NULL AND "application_settings"."shared_runners_text_html" IS NOT NULL AND "application_settings"."after_sign_up_text_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for BroadcastMessage: ["message_html"]
(0.3ms) SELECT "broadcast_messages"."id" FROM "broadcast_messages" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: BroadcastMessage (0.3ms) UPDATE "broadcast_messages" SET "message_html" = NULL WHERE "broadcast_messages"."id" IN (SELECT id FROM "broadcast_messages" WHERE "broadcast_messages"."id" <= 0 AND "broadcast_messages"."message_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Issue: ["title_html", "description_html"]
(0.4ms) SELECT "issues"."id" FROM "issues" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Issue (0.8ms) UPDATE "issues" SET "title_html" = NULL, "description_html" = NULL WHERE "issues"."id" IN (SELECT id FROM "issues" WHERE "issues"."id" <= 80 AND "issues"."title_html" IS NOT NULL AND "issues"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Label: ["description_html"]
(0.3ms) SELECT "labels"."id" FROM "labels" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Label (0.4ms) UPDATE "labels" SET "description_html" = NULL WHERE "labels"."id" IN (SELECT id FROM "labels" WHERE "labels"."id" <= 0 AND "labels"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for MergeRequest: ["title_html", "description_html"]
(0.3ms) SELECT "merge_requests"."id" FROM "merge_requests" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: MergeRequest (0.7ms) UPDATE "merge_requests" SET "title_html" = NULL, "description_html" = NULL WHERE "merge_requests"."id" IN (SELECT id FROM "merge_requests" WHERE "merge_requests"."id" <= 44 AND "merge_requests"."title_html" IS NOT NULL AND "merge_requests"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Milestone: ["title_html", "description_html"]
(0.3ms) SELECT "milestones"."id" FROM "milestones" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Milestone (0.6ms) UPDATE "milestones" SET "title_html" = NULL, "description_html" = NULL WHERE "milestones"."id" IN (SELECT id FROM "milestones" WHERE "milestones"."id" <= 40 AND "milestones"."title_html" IS NOT NULL AND "milestones"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Namespace: ["description_html"]
(0.4ms) SELECT "namespaces"."id" FROM "namespaces" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Namespace (0.6ms) UPDATE "namespaces" SET "description_html" = NULL WHERE "namespaces"."id" IN (SELECT id FROM "namespaces" WHERE "namespaces"."id" <= 30 AND "namespaces"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Note: ["note_html"]
(0.3ms) SELECT "notes"."id" FROM "notes" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Note (4.4ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (4.2ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (4.2ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (4.4ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (4.1ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (4.7ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (6.4ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (3.9ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (3.7ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (6.0ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Note (0.7ms) UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Project: ["description_html"]
(0.4ms) SELECT "projects"."id" FROM "projects" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Project (0.9ms) UPDATE "projects" SET "description_html" = NULL WHERE "projects"."id" IN (SELECT id FROM "projects" WHERE "projects"."id" <= 9 AND "projects"."description_html" IS NOT NULL LIMIT 100)
ClearDatabaseCacheWorker: Project (0.9ms) UPDATE "projects" SET "description_html" = NULL WHERE "projects"."id" IN (SELECT id FROM "projects" WHERE "projects"."id" <= 9 AND "projects"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Release: ["description_html"]
(0.2ms) SELECT "releases"."id" FROM "releases" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Release (0.4ms) UPDATE "releases" SET "description_html" = NULL WHERE "releases"."id" IN (SELECT id FROM "releases" WHERE "releases"."id" <= 0 AND "releases"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Snippet: ["title_html", "content_html"]
(0.2ms) SELECT "snippets"."id" FROM "snippets" ORDER BY id DESC LIMIT 1
ClearDatabaseCacheWorker: Snippet (0.4ms) UPDATE "snippets" SET "title_html" = NULL, "content_html" = NULL WHERE "snippets"."id" IN (SELECT id FROM "snippets" WHERE "snippets"."id" <= 49 AND "snippets"."title_html" IS NOT NULL AND "snippets"."content_html" IS NOT NULL LIMIT 100)
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary - Tests
-
All builds are passing
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Branch has no merge conflicts with master
(if it does - rebase it please) -
Squashed related commits together
What are the relevant issue numbers?
Related to #30672 (closed) https://gitlab.com/gitlab-com/infrastructure/issues/1576#note_27127622