Skip to content

Optimise ClearDatabaseCacheWorker

What does this MR do?

Reduces the amount of work performed by ClearDatabaseCacheWorker

We move from loops like this:

  SELECT id... LIMIT 1000
  UPDATE ... SET x=nil, y=nil WHERE id IN (...)

To this:

  UPDATE ... SET x = NULL, y = NULL WHERE id IN ( SELECT id...  WHERE x IS NOT NULL AND y IS NOT NULL LIMIT 1000)

This has two main effects:

  • Using the subselect means we don't have to pass a thousand IDs out of SQL and back in again
  • Including the IS NOT NULL clauses means that the loop continues from where it left off in the event of a retry

Are there points in the code the reviewer needs to double check?

Why was this MR needed?

The existing logic causes problems on GitLab.com

Screenshots (if relevant)

Here's the output of running the worker with a batch size of 100 and ~900 notes in the database ready to be cleared:

[29] pry(main)> ClearDatabaseCacheWorker.new.perform
Clearing Markdown cache for AbuseReport: ["message_html"]
   (0.2ms)  SELECT  "abuse_reports"."id" FROM "abuse_reports"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: AbuseReport (0.4ms)  UPDATE "abuse_reports" SET "message_html" = NULL WHERE "abuse_reports"."id" IN (SELECT  id FROM "abuse_reports" WHERE "abuse_reports"."id" <= 0 AND "abuse_reports"."message_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Appearance: ["description_html"]
   (0.2ms)  SELECT  "appearances"."id" FROM "appearances"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Appearance (0.4ms)  UPDATE "appearances" SET "description_html" = NULL WHERE "appearances"."id" IN (SELECT  id FROM "appearances" WHERE "appearances"."id" <= 0 AND "appearances"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for ApplicationSetting: ["sign_in_text_html", "help_page_text_html", "shared_runners_text_html", "after_sign_up_text_html"]
   (0.4ms)  SELECT  "application_settings"."id" FROM "application_settings"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: ApplicationSetting (0.6ms)  UPDATE "application_settings" SET "sign_in_text_html" = NULL, "help_page_text_html" = NULL, "shared_runners_text_html" = NULL, "after_sign_up_text_html" = NULL WHERE "application_settings"."id" IN (SELECT  id FROM "application_settings" WHERE "application_settings"."id" <= 1 AND "application_settings"."sign_in_text_html" IS NOT NULL AND "application_settings"."help_page_text_html" IS NOT NULL AND "application_settings"."shared_runners_text_html" IS NOT NULL AND "application_settings"."after_sign_up_text_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for BroadcastMessage: ["message_html"]
   (0.3ms)  SELECT  "broadcast_messages"."id" FROM "broadcast_messages"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: BroadcastMessage (0.3ms)  UPDATE "broadcast_messages" SET "message_html" = NULL WHERE "broadcast_messages"."id" IN (SELECT  id FROM "broadcast_messages" WHERE "broadcast_messages"."id" <= 0 AND "broadcast_messages"."message_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Issue: ["title_html", "description_html"]
   (0.4ms)  SELECT  "issues"."id" FROM "issues"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Issue (0.8ms)  UPDATE "issues" SET "title_html" = NULL, "description_html" = NULL WHERE "issues"."id" IN (SELECT  id FROM "issues" WHERE "issues"."id" <= 80 AND "issues"."title_html" IS NOT NULL AND "issues"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Label: ["description_html"]
   (0.3ms)  SELECT  "labels"."id" FROM "labels"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Label (0.4ms)  UPDATE "labels" SET "description_html" = NULL WHERE "labels"."id" IN (SELECT  id FROM "labels" WHERE "labels"."id" <= 0 AND "labels"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for MergeRequest: ["title_html", "description_html"]
   (0.3ms)  SELECT  "merge_requests"."id" FROM "merge_requests"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: MergeRequest (0.7ms)  UPDATE "merge_requests" SET "title_html" = NULL, "description_html" = NULL WHERE "merge_requests"."id" IN (SELECT  id FROM "merge_requests" WHERE "merge_requests"."id" <= 44 AND "merge_requests"."title_html" IS NOT NULL AND "merge_requests"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Milestone: ["title_html", "description_html"]
   (0.3ms)  SELECT  "milestones"."id" FROM "milestones"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Milestone (0.6ms)  UPDATE "milestones" SET "title_html" = NULL, "description_html" = NULL WHERE "milestones"."id" IN (SELECT  id FROM "milestones" WHERE "milestones"."id" <= 40 AND "milestones"."title_html" IS NOT NULL AND "milestones"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Namespace: ["description_html"]
   (0.4ms)  SELECT  "namespaces"."id" FROM "namespaces"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Namespace (0.6ms)  UPDATE "namespaces" SET "description_html" = NULL WHERE "namespaces"."id" IN (SELECT  id FROM "namespaces" WHERE "namespaces"."id" <= 30 AND "namespaces"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Note: ["note_html"]
   (0.3ms)  SELECT  "notes"."id" FROM "notes"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Note (4.4ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (4.2ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (4.2ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (4.4ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (4.1ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (4.7ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (6.4ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (3.9ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (3.7ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (6.0ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Note (0.7ms)  UPDATE "notes" SET "note_html" = NULL WHERE "notes"."id" IN (SELECT  id FROM "notes" WHERE "notes"."id" <= 997 AND "notes"."note_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Project: ["description_html"]
   (0.4ms)  SELECT  "projects"."id" FROM "projects"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Project (0.9ms)  UPDATE "projects" SET "description_html" = NULL WHERE "projects"."id" IN (SELECT  id FROM "projects" WHERE "projects"."id" <= 9 AND "projects"."description_html" IS NOT NULL LIMIT 100)
  ClearDatabaseCacheWorker: Project (0.9ms)  UPDATE "projects" SET "description_html" = NULL WHERE "projects"."id" IN (SELECT  id FROM "projects" WHERE "projects"."id" <= 9 AND "projects"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Release: ["description_html"]
   (0.2ms)  SELECT  "releases"."id" FROM "releases"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Release (0.4ms)  UPDATE "releases" SET "description_html" = NULL WHERE "releases"."id" IN (SELECT  id FROM "releases" WHERE "releases"."id" <= 0 AND "releases"."description_html" IS NOT NULL LIMIT 100)
Clearing Markdown cache for Snippet: ["title_html", "content_html"]
   (0.2ms)  SELECT  "snippets"."id" FROM "snippets"  ORDER BY id DESC LIMIT 1
  ClearDatabaseCacheWorker: Snippet (0.4ms)  UPDATE "snippets" SET "title_html" = NULL, "content_html" = NULL WHERE "snippets"."id" IN (SELECT  id FROM "snippets" WHERE "snippets"."id" <= 49 AND "snippets"."title_html" IS NOT NULL AND "snippets"."content_html" IS NOT NULL LIMIT 100)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Related to #30672 (closed) https://gitlab.com/gitlab-com/infrastructure/issues/1576#note_27127622

Merge request reports