Snippets Groups Projects

Do not update/delete: Banner broadcast message test data

Do not update/delete: Notification broadcast message test data

New issue look: Off

Geo: huge cross-database pluck and `NOT IN (...)` query for LFS objects and attachments

Summary

Sentry event: https://sentry.gitlap.com/gitlab/geo-testbed/issues/70345/

Similarly to https://gitlab.com/gitlab-org/gitlab-ee/issues/3269 , it looks like we're having problems backfilling many LFS attachments in Geo because we pluck a huge list of IDs from the tracking database then run a NOT IN (...) query on the read-only slave. This takes forever to execute.

Steps to reproduce

Add a fresh Geo secondary to a server with many LFS objects

Example Project

(If possible, please create an example project here on GitLab.com that exhibits the problematic behaviour, and link to it here in the bug report)

(If you are using an older version of GitLab, this will also determine whether the bug has been fixed in a more recent version)

What is the current bug behavior?

PG::TRSerializationFailure: ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.

  from peek/views/pg.rb:17:in `async_exec'
  from peek/views/pg.rb:17:in `async_exec'
  from active_record/connection_adapters/postgresql_adapter.rb:592:in `block in exec_no_cache'
  from active_record/connection_adapters/abstract_adapter.rb:484:in `block in log'
  from active_support/notifications/instrumenter.rb:20:in `instrument'
  from active_record/connection_adapters/abstract_adapter.rb:478:in `log'
  from active_record/connection_adapters/postgresql_adapter.rb:592:in `exec_no_cache'
  from active_record/connection_adapters/postgresql_adapter.rb:584:in `execute_and_clear'
  from active_record/connection_adapters/postgresql/database_statements.rb:160:in `exec_query'
  from active_record/connection_adapters/abstract/database_statements.rb:356:in `select'
  from active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
  from active_record/connection_adapters/abstract/query_cache.rb:70:in `select_all'
  from active_record/relation/calculations.rb:186:in `pluck'
  from app/workers/geo/file_download_dispatch_worker.rb:36:in `find_lfs_object_ids'
  from app/workers/geo/file_download_dispatch_worker.rb:12:in `load_pending_resources'
  from app/workers/geo/base_scheduler_worker.rb:122:in `update_pending_resources'
  from app/workers/geo/base_scheduler_worker.rb:43:in `block (2 levels) in perform'
  from app/workers/geo/base_scheduler_worker.rb:39:in `loop'
  from app/workers/geo/base_scheduler_worker.rb:39:in `block in perform'
  from app/workers/geo/base_scheduler_worker.rb:152:in `try_obtain_lease'
  from app/workers/geo/base_scheduler_worker.rb:34:in `perform'
  from sidekiq/processor.rb:199:in `execute_job'

                  .where.not(id: downloaded_ids)
                  .order(created_at: :desc)
                  .limit(db_retrieve_batch_size)
                  .pluck(:id)
                  .map { |id| [id, :lfs] }
    end

What is the expected correct behavior?

We should use postgres_fdw to perform these queries. Until that's ready, we need to implement the approach we used in #3269 (closed) to avoid the huge NOT IN fragment.

Possible fixes

/cc @brodock @dbalexandre @stanhu @to1ne

Designs

Child items ...

Activity

Nick Thomas @nick.thomas · 7 years ago

Author Maintainer

There are 2,950 IDs in the NOT IN query at the moment.
Nick Thomas @nick.thomas · 7 years ago

Author Maintainer

And here's a view of the SQL timings: https://performance.gitlab.net/dashboard/db/sidekiq-workers?orgId=1&var-worker=Geo::FileDownloadDispatchWorker%23perform&var-database=GeoTestBed%20-%20Sync

It started to increase from the moment the mountpoints containing LFS objects and attachments were added to the testbed.
Nick Thomas @nick.thomas · 7 years ago

Author Maintainer

We're now getting genuine statement timeouts: https://sentry.gitlap.com/gitlab/geo-testbed/issues/70355/
Stan Hu mentioned in merge request !3075 (merged) 7 years ago

mentioned in merge request !3075 (merged)
Nick Thomas added In review label 7 years ago

added In review label
Toon Claes assigned to @stanhu 7 years ago

assigned to @stanhu
Stan Hu mentioned in commit 2aadbc09 7 years ago

mentioned in commit 2aadbc09
Nick Thomas closed via merge request !3075 (merged) 7 years ago

closed via merge request !3075 (merged)
Nick Thomas mentioned in commit 40fa4c96 7 years ago

mentioned in commit 40fa4c96

Please register or sign in to reply

Due date

None

Health status

None

Confidentiality

Confidentiality controls have moved to the issue actions menu () at the top of the page.

0 Participants