WIP: Bundle pgpool-II 3.6.1
General
-
Get the absolute most basic pgpool setup up and running: 1 pgpool instance, one host to forward queries to (in staging) -
Figure out how to add hosts to pgpool, and if this can be done dynamically - This seems to require updating the pgpool configuration file (http://www.pgpool.net/docs/latest/en/html/runtime-config-backend-settings.html#GUC-BACKEND-HOSTNAME for example), though some options can be set using SQL commands as well (but they're not persisted/flushed to disk) as per http://www.pgpool.net/docs/latest/en/html/sql-pgpool-set.html
- PCP commands can also be used for this, but as far as I can tell they are not persisted: http://www.pgpool.net/docs/latest/en/html/pcp-attach-node.html
-
Log to STDERR so runit can take care of it - Logging to STDERR is the default
-
Figure out if relation caching is useful to us (this might speed up application bootups) - Used for caching system catalog information, doesn't seem to be flushed automatically upon an
ALTER table
; requires a timeout instead. This could speed up queries used by Rails to get the columns for every model, though it's not clear how great that overhead is to begin with
- Used for caching system catalog information, doesn't seem to be flushed automatically upon an
-
Figure out what pgpool uses in-memory caching for, and if it's useful - This appears to be used for caching the output of SELECT queries. This could be quite useful for caching frequently executed queries. The cache appears to be flushed automatically.
-
Figure out if we use temporary tables, disable pgpool checking this if we don't use them (reduces master load) - As far as I can tell we only use this in
lib/gitlab/database/median.rb
and only for MySQL
- As far as I can tell we only use this in
PostgreSQL
-
Move PostgreSQL to a different port (not accessible from the outside), and have pgpool listen on 5432 -
Automatically enable the pgpool_recovery
extension as mentioned on http://www.pgpool.net/docs/latest/en/html/install-pgpool-recovery.html -
adjusting PostgreSQL settings as mentioned on http://www.pgpool.net/docs/latest/en/html/install-pgpool-recovery.html -
Set up PostgreSQL user for pgpool, used for health checks, etc. This user will either need a password, or we somehow allow this user to connect without one (including on remote hosts) -
Ensure a replication user is in place, with a password (or another way to connect to other DB hosts). Currently we use gitlab_replicator
but this is managed by https://gitlab.com/gitlab-cookbooks/gitlab-pgsql-ha/ if I'm not mistaken
pgpool
-
pgpool configuration generation/management ( pgpool.conf
andpgpool_hba.conf
) -
Set up pgpool watchdog (required for managing multiple nodes, reaching consensus, failovers, etc: http://www.pgpool.net/docs/latest/en/html/tutorial-watchdog-intro.html) -
Create/configure the PID directory (if necessary when using runit) -
Support & enable pgpool health checks -
Support & enable pgpool replication checks -
Create/configure the socket directory
Replication
The following is required to set up replication, which is necessary to ensure all load balanced nodes have the same data:
-
Synchronous replication so changes don't vanish between requests: https://www.postgresql.org/docs/9.6/static/warm-standby.html#SYNCHRONOUS-REPLICATION -
Generate a recovery.conf
file in the PostgreSQL data directory. This file should contain the right recovery settings for a standby node. -
Being able to persist node addresses, primaries, etc in Chef
Merge request reports
Activity
@ibaum The documentation at http://www.pgpool.net/docs/latest/en/html/install-pgpool-recovery.html mentions running a few extra steps after compiling pgpool_recovery (which is apparently required). What would be the best approach to run these steps? Do we have anything for this at all?
@marin @ibaum One of the problems I'm currently running into with Omnibus is the split between Pg 9.2 and 9.6. Pgpool has an extension that should be built/installed using whatever the active version is (9.6 in our case), not the default one. Unfortunately it seems that even when using
pg_config
the extension gets installed in 9.2. I messed around a bit with getting the right parameters usingpg_config
, but now it fails with the command not being found (https://dev.gitlab.org/gitlab/omnibus-gitlab/builds/695666).So in short, how can we ensure that pgpool is installed using the active version of PostgreSQL, instead of the default one?
Edited by yorickpeterse-stagingSo for the actual error you're seeing though, the backticks for setting include and lib dir are evaulated at load time, not when the software is being built. You might be able to set them in a
block
call:block 'find vars' do include_dir = `#{pg_config} --includedir` lib_dir = `#{pg_config} --libdir` end
But their scope might only be local to that block.
http://www.rubydoc.info/github/chef/omnibus/Omnibus/Builder#block-instance_method
added 2 commits
- a88d1a78 - Bundle pgpool-II 3.6.1
- a0ccb77f - Build from a custom branch
added 2 commits
- f49ae2a4 - Build from a custom branch
- 69ce5803 - Bundle pgpool-II 3.6.1
db1.staging is currently copying the initial database set to db2.staging, db3.staging, and db4.staging. Based on the current progress (about 3% in half an hour) this may take a day or two to complete. Replicating the data is necessary to ensure load balanced queries all operate on the same data set. For the sake of simplicity I have set this up manually, based on our existing Pacemaker/corosync replication setup.
Edited by yorickpeterse-stagingmentioned in issue #1807 (closed)
@yorickpeterse I've added some initial code and specs for postgresql.conf, pg_hba.conf and recovery.conf creation. I'm spending some time today on some authentication for the postgresql replication user.
As far as automating the initial sync of the standby servers, I've got two different approaches in mind.
- Chef recipe: My thoughts would be that chef would touch a certain file once sync was complete, and use the presence of that file as an indicator to not try and resync on later runs.
- omnibus-ctl command: Run
gitlab-ctl sync-standby
to perform all the needed steps.
I'm leaning towards the omnibus-ctl command, but I'm definitely open to other viewpoints.
@ibaum I think a manual sync would be preferable for now. I don't want to run into cases where the magic file is removed by accident and suddenly Chef starts re-replicating data, potentially resulting in users being impacted. Having the command means we can always automate more on top of it.
mentioned in issue gitlab-com/infrastructure#1096 (closed)
So in this state, reconfigure drops down a pgpass file that pg_basebackup will then use to authenticate with the primary server. This will assume that
/etc/gitlab/gitlab.rb
and the.pgpass
files are secured.I do need to make the .pgpass logic more generic as it's likely we may need it elsewhere.
I'll focus tomorrow on getting a command added to gitlab-ctl to handle the initial sync of data for standby servers.
mentioned in issue gitlab-com/www-gitlab-com#1108 (closed)
added 1 commit
- d745ef2f - Initial version of sync script for standing up a standby postgresql
So very initial version of the script for syncing standby servers.
- Turns out that due to the way that omnibus-ctl loads command files, the optparse settings carry over from file to file. So I'll need to get those to merge, right now the options in the script don't actually do anything.
- Right now if you set
postgresql['listen_address']
ingitlab.rb
(so standby servers can talk to it), it looks like gitlab then tries to contact postgresql over 127.0.0.1 instead of the socket. We could make this optional behavior, or just recommend setting listen_address to 0.0.0.0 and assume people will use other measures if they don't want people talking to the db on particular interfaces. If we go the latter we may just want a boolean attribute. - I've got a separate MR for a different issue that breaks out some of the pg-ugprade logic into lib files. This script will benefit somewhat from some of that code.
@ibaum Worth mentioning, we may end up using repmgr (http://www.repmgr.org/) as it supposedly offers easier replication management. I haven't really looked into it yet, but it's an option.
In the current setup, how do we trigger failovers in a multi-host setup? That is, how do we ensure the trigger file is created on the host to fail over to? ^1
^1 This is why I mentioned repmgr, it claims to make this much easier.
So my understanding of the current procedure is that failover is handled by pacemaker, but failback is manual. It looks like https://gitlab.com/gitlab-cookbooks/gitlab-pgsql-ha/blob/master/templates/default/gitlab_pgsql.erb creates the trigger file. I'm not sure what the failback process is.
Pgpool seems to provide reasonable failover management. What advantages would repmgr have over that?
From a packaging perspective, repmgr isn't huge, and doesn't have any crazy external dependencies. It does need to install similar to a postgresql extension. So like pgpool we would need to be concerned about which version it is built against (timing wise, it's probably safe to just say this is going to work with 9.6 only). All the components share a configuration file, which isn't too complex. Outside of that there are a few commands which need to be run only during initial setup, but shouldn't be too complex to automate . Docs do recommend a separate user account, but I'm not sure we couldn't just use
gitlab-psql
.So if the decision was made to include repmgr, it wouldn't be crazy to include into omnibus-gitlab.
With todays uploads, you can spin up a primary and standby server, and with 3 commands you should have streaming replication working. You should set
/etc/gitlab/gitlab.rb
to have the following on the primary serverpostgresql['listen_address'] = '0.0.0.0' postgresql['md5_auth_cidr_addresses'] = ['192.168.50.5/32'] postgresql['sql_replication_user'] = "gitlab_replicator" postgresql['sql_replication_user_password'] = "gitlab_replicator_password" postgresql['wal_level'] = "hot_standby" postgresql['max_wal_senders'] = 5 postgresql['wal_keep_segments'] = 10
The standby server should have:
gitlab_workhorse['enable'] = false postgresql['listen_address'] = '0.0.0.0' postgresql['ha_standby'] = true postgresql['standby_mode'] = 'on' postgresql['primary_host'] = '192.168.50.4' postgresql['primary_port'] = '5432' postgresql['md5_auth_cidr_addresses'] = node['gitlab']['postgresql']['listen_address'] postgresql['sql_replication_user'] = "gitlab_replicator" postgresql['sql_replication_user_password'] = "gitlab_replicator_password" postgresql['wal_level'] = "hot_standby" postgresql['max_wal_senders'] = 5 postgresql['wal_keep_segments'] = 10 postgresql['hot_standby'] = "on" redis['enable'] = false nginx['enable'] = false sidekiq['enable'] = false unicorn['enable'] = false
Run
gitlab-ctl reconfigure
on the primary server, thengitlab-ctl pg-initialize-standby
on the standby server.added 138 commits
-
1d2965d5...487e5923 - 132 commits from branch
master
- 8a3ed9f5 - Build from master branch of gitlab
- e83087a0 - Merge branch 'master' into pgpool
- bfd36b1d - Fixing bad merge
- 08154ed3 - Switching to 'ee' build. Adding postgresql_extension resource
- ddeee45a - Removing leading v from versions
- 78dc7bc3 - Updating cookbooks to use gitlab_postgresql_extension cookbook
Toggle commit list-
1d2965d5...487e5923 - 132 commits from branch
I've started adding pgpool automation to the cookbook. First step is adding a custom resource (https://docs.chef.io/definitions.html definitions seem to be out of favor) to enable/disable postgresql extensions. The downside is the resource will be name
${cookbook}_${filename}
, so currently it shows up asgitlab_postgresql_extension
which doesn't really read nicely. I'm not sure it warrants a separate cookbook.added 1 commit
- 430ee235 - Get postgresql_extension custom resource functional
added 4 commits
- fd60752d - Adding pgpool attribute
- 7474a615 - Vendored from src/sample/pcp.conf.sample in pgpool-II 3.6.1
- 9f7d3087 - Vendored from src/sample/pgpool.conf.sample-master-slave in pgpool-II v
- 86f0c2fb - First steps for a setting up pgpool with chef
Toggle commit listchanged milestone to %9.0
Note the PostgreSQL docs for wal_level are clear that the terms
archive
andhot_standby
are deprecated and will be translated toreplica
for now."In releases prior to 9.6, this parameter also allowed the values archive and hot_standby. These are still accepted but mapped to replica."
Viable options for
wal_level
are:- minimal
- replica
- logical
mentioned in issue #1968 (closed)
mentioned in merge request !1306 (closed)
added 58 commits
- 86f0c2fb...c351aba5 - 40 commits from branch
master
- c3bb6b42 - Build from a custom branch
- eae612ec - Bundle pgpool-II 3.6.1
- d52aed9d - Adding sample recovery.conf from PostgreSQL 9.6.1 source
- c58602d5 - First pass at creating postgresql.conf and recovery.conf on HA nodes
- 3db2d606 - Updating to add .pgpass file
- 22414720 - Removing trailing ' ' from .pgpass template
- d0899d6a - Initial version of sync script for standing up a standby postgresql
- d8341d6a - Add loopback to pg_hba
- 0ab7fc01 - Build from master branch of gitlab
- a79522af - Switching to 'ee' build. Adding postgresql_extension resource
- 3ca1acbe - Removing leading v from versions
- 60b724fa - Updating cookbooks to use gitlab_postgresql_extension cookbook
- 18f95d8a - Get postgresql_extension custom resource functional
- 8d188379 - Adding pgpool attribute
- 2250f74f - Vendored from src/sample/pcp.conf.sample in pgpool-II 3.6.1
- f1e26fb9 - Vendored from src/sample/pgpool.conf.sample-master-slave in pgpool-II v
- ba1ad1eb - First steps for a setting up pgpool with chef
- 14d56535 - Remove pgpool from CE defaults
Toggle commit list- 86f0c2fb...c351aba5 - 40 commits from branch
@ibaum Having done testing and having put some thought into this, we're not going to use pgpool. Instead we'll work on a solution combining pgbouncer and custom application logic to handle the load balancing. This gives us more control and makes it possible to do load balancing without having to rely on synchronous replication. See https://gitlab.com/gitlab-com/infrastructure/issues/259#note_23789767 for more information.
For the HA side of things this means we'll break up the process in the following steps:
- Get pgbouncer in omnibus (I'll take care of this as much as possible)
- Add load balancing to the application
- Use something like repmgr to manage replication, this however will be done last
I hope to be able to do this without having to also introduce haproxy and the likes, but I'll see how far I can get. If there's anything in this MR that's useful (e.g. anything replication related) we should use separate MRs for that.
Edited by yorickpeterse-staging@ibaum I think that we might be better off to help @yorickpeterse with 1. so he can focus on other tasks, what do you think? We already have some investigative work done for haproxy so we might be able to leverage that if needed for the rest of the tasks.
@marin @yorickpeterse Absolutely, I can take a look at 1. I took a quick look at https://pgbouncer.github.io/install.html it shouldn't be too complicated to incorporate.
From past experience, bundling haproxy won't be too bad. Building it is relatively straightforward. It's config file is a mess though. But ideally that's only something we would need to delve into on occasion if we do it correctly.
@ibaum I already did some work on that, see https://gitlab.com/gitlab-org/omnibus-gitlab/merge_requests/1092 . The config is the main show-stopper, together with the way haproxy logs.
@ibaum @marin Per https://gitlab.com/gitlab-com/infrastructure/issues/259#note_23885535 we won't need haproxy for load balancing, as load balancing will be done in the app. However, we might still need it for load balancing HTTP requests but that's a different story/problem.
Edited by yorickpeterse-stagingLet's close this as we are moving into pgbouncer testing https://gitlab.com/gitlab-org/omnibus-gitlab/merge_requests/1345.
mentioned in issue gitlab-com/infrastructure#1684