Blog post on scaling the GitLab database

Merged yorickpeterse-staging requested to merge scaling-gitlab-database into master

@sitschner and I discussed that one way we can perhaps attract more people to the DB team (or at least make them aware of what we do) is by writing more about it. I in turn needed a good excuse to write more, so here we go.

This MR adds a blog post about how GitLab uses both pgbouncer and database load balancing to scale our database cluster. It's a fairly technical blog post aimed at not only covering the bigger picture but also some implementation details.


  • Add more technical details on the load balancer (e.g. code snippets with an explanation) so this section reads less like I'm patting myself on the back
  • Provide details on our exact pgbouncer configuration (e.g. the exact pool sizes, etc)
  • Perhaps add some diagrams to better illustrate the typical database setup vs a pgbouncer setup
  • Add a summary for the blog index page (displayed before the <!-- more --> tag)
  • Briefly explain why we're not sharding (as suggested by Sytse)
  • Include the transaction count per minute somewhere (also suggested by Sytse)
  • Briefly explain why we're not using (since it also has some load balancing mechanisms I believe)
  • Describe our co-operation with Crunchy, and perhaps scrub and make the canonical issue public

cc @sitschner

Edited by yorickpeterse-staging