Skip to content

Load balancing of database queries

yorickpeterse-staging requested to merge load-balancing into master

This adds support for load balancing of database queries when PostgreSQL is used (MySQL is not supported). The commit message(s) contain the most details, so please refer to those for an in-depth description of this feature. Some key features worth mentioning:

  • Prepared statements are disabled automatically since these don't work well with load balancing
  • Failovers/database restarts are handled gracefully. For example, an offline secondary is ignored; while the primary uses a retry mechanism with an exponential backoff
  • Load is balanced using a simple round-robin algorithm, without any external dependencies such as Redis
  • In the event of no hosts being available a dedicated error (Gitlab::Database::LoadBalancing::NoHostsAvailable) is raised to make monitoring easier
  • After a write a user's requests will use the primary until the secondaries are in sync OR until a timeout expires (30 seconds at the moment)
  • Load balancing is not enabled for Sidekiq as this would lead to consistency problems, and Sidekiq mostly performs writes anyway

A hard requirement for load balancing is that all the database hosts point to the right type of database. The host in config/database.yml must always point to a primary (even after a failover), and the additional hosts must always point to a secondary. This means you'll need to place a load balancer in front of every database, and connect to those load balancers. During a failover the user must take care of re-routing traffic to the right hosts using these load balancers.

Configuring the hosts is currently done using an environment variable (LOAD_BALANCE_DATABASE_HOSTS). This removes the need for making any Omnibus changes, or any extra tables/columns in the database.

Related issue: https://gitlab.com/gitlab-com/infrastructure/issues/259

TODO

  • Make HostList#next thread-safe
  • Test load distribution on staging
  • Test failovers Code wise this is taken care of, but we lack the right infrastructure on GitLab.com to reliably fail over (even before load balancing). This is taken care of separately.
  • Disable load balancing when running Rake tasks (e.g. db:migrate)
  • Re-use ActiveRecord::Base.connection for the primary in HostList, instead of connecting separately. The latter would lead to a 2x increase in the number of connections on the primary. An alternative is to not send read-only queries to the primary, instead only using it for/after writes

Merge request reports