ClearDatabaseCacheWorker keeps retrying and causes high amount of table bloat
We ran ClearDatabaseCacheWorker
manually via a Rake task, and it caused a high amount of table bloat and replication lag because it failed and retried. For more details, see: https://gitlab.com/gitlab-com/infrastructure/issues/1576#note_27127622.
It appears that the Sidekiq job continued to retry multiple times over the course of 24 hours:
Several problems here:
- It should not retry so much
- We should limit the amount of table bloat this causes in production
- It should resume from where it left off rather than reclear the same database rows
Any other ideas, @pcarranza?
/cc: @nick.thomas