Application communicates with pgbouncer for its database connection
pgbouncer communicates directly with master node
On master failure, repmgrd updates pgbouncer instances with new master
repmgrd/pgbouncer/{nginx,haproxy}:
Run pgbouncer on application nodes
Application communicates with pgbouncer for database connection
Pgbouncer communicates with load balancer
Load balancer communicates with master node
On master failure, repmgrd updates load balancer with new master
For both paths, the method for repmgrd to inform of updates is still a bit up in the air.
Using ssh keys to allow repmgrd to update other nodes is an option
Pros:
It should work with our existing package. No new software needs to be added.
We do not need to grant root access to repmgrd. Everything it needs should be able to be done by the sql user.
Cons:
We enter an unknown state if notification to a node fails.
Utilize a key value store that repmgrd will update on new master promotion. Application or load balancer nodes will periodically check for a change and update their config accordingly.
Pros:
Nodes are responsible for ensuring their own state is up to date
@twk3 Marin mentioned you might be considering Etcd (or some sort of key value store) for something you are working on. If we can compare notes on this front, then it might make it easier to justify its presence in the omnibus package.
@ibaum it was considered, and then dropped for the licensing stuff because it wasn't worth adding something just for the licensing. (Then we dropped the licensing changes as well).
And it will probably come up again when looking at more HA config. But I haven't looked into anything at the moment.
TL/DR: The script promotes the new master, then notifies the pgbouncer nodes of the new master by generating a new database stanza for the pgbouncer nodes, then sends a reload command to the pgbouncer node.
Update promote_command in repmgr.conf to use the script
Need to debug the script a little better(mostly it's use of psql), but it is executed by the node that was chosen as the new master. Manually executing the failed steps, somewhat corrected, was successful.
Each db node needs to know about each pgbouncer node. So adding/removing pgbouncer nodes will need to propagate to the database nodes.
the new master database node needs to execute a command on each of the pgbouncer nodes when it's promoted. We'll need to account for the possibility that a pgbouncer node is unreachable during failover.
I'll start investigating using service discovery. Top three candidates in this category seem to be etcd, consul, and zookeeper. I'm going to rule out zookeeper since it's written in Java.
Consul and etcd are both written in Go, and have compatible licenses. I'll put together a list of pros and cons of both to see which one we should evaluate more in depth.
This assumes etcdctl is installed on all nodes in the users PATH. I did this by downloading the binary from the etcd releases page, I'm using version 3.2.2
Etcd server
Spin up a node and run etcd. Currently one node only, with no security in place
Preseed etcd with current master: etcdctl set pg_master $CURRENT_MASTER
Pgbouncer server
Modify pgbouncer.ini
Move the [databases] section to a separate file called pgbouncer-database.ini
Add a line %include /var/opt/gitlab/pgbouncer/pgbouncer-database.ini to the end of the file
Leave running in a window: etcdctl --endpoints http://etcd:2379 exec-watch pg_master -- /var/opt/gitlab/pgbouncer/setdb
Database nodes
Create a new promote script:
#!/bin/bashgitlab-ctl repmgr standby promoteetcdctl --endpoints http://etcd:2379 set pg_master $(hostname)
Update promote_command in /var/opt/gitlab/postgresql/repmgr.conf with the location of the new script
gitlab-ctl restart repmgrd
To test, I shutdown postgresql on the master node and everything works as expected.
repmgrd on the standby nodes noticed the master was down, and waited for their timeout
After the timeoue, one of the nodes was selected as the new master and promoted.
The promotion updated the etcd key
The pgbouncer server caught the changed key through its exec-watch command, updated the pgbouncer-database.ini file and sent a HUP to the pgbouncer process
The pgbouncer process reread it's configuration, and sent database traffic to the new master server.
Consul is installed on the consul server, and the database nodes.
Consul server
Run consul agent -server -bootstrap-expect=1 -data-dir=/tmp/consul -config-dir=/etc/consul.d -bind=172.21.0.6 -ui -client=172.21.0.6 -dns-port=53 -recursor=127.0.0.11
Note: Like the Etcd test, this completely ignores clustering, and security
Database nodes
Create a check script for postgresql. I was unable to find if consul had a concept for passive/active services, so the script will mark a service as failed on a node if it isn't the master, so consul wouldn't send traffic to it.
I really like consul's DNS interface. But, in order to take advantage of that GitLab will require a change to the system (resolv.conf) outside of Omnibus. So we should be really sure it's something we want to do before we offer that.
If we aren't going to do that, the approach would be similar to etcd. We'd run a consul agent on the pgbouncer node, and react to postgresql failure events from that.
So consul itself is a little more complicated to setup, but the workflow is a bit cleaner with the DNS interface. If that's something we would want to use, I would advocate for consul. Its other features may be handy for future use, but I'm not aware of anything in the pipeline that will immediately use it.
If we want to keep as much as possible within omnibus, etcd might be the way to go. It is a bit simpler than consul.
I'm definitely open to feedback, and a call is probably on order for Monday/Tuesday to discuss this.
But, in order to take advantage of that GitLab will require a change to the system (resolv.conf) outside of Omnibus.
This is a bit of a deal breaker for me, I wouldn't want a package to go in and change my resolv.conf. Currently leaning more towards etcd but lets meet up and make a decision.
This is specific to our infrastructure, and not our package. Regardless, the point is to update a template and reload a service when a state changes. This is something that could be handled by either.
From the comment in https://gitlab.com/gitlab-org/omnibus-gitlab/issues/2571#note_35319301 the thing that sticks out to me is that all the issues are very much infrastructure specific. I am not really sure how us shipping consul would make a difference for the Infra team. There is something to be said about using the same tool though, so that could be the difference.
@stanhu we need a tie breaker here, do you have any opinion about this?
Requiring to tweak resolv.conf: this opens up all sorts of security, network connectivity, reliability issues for our customers
I tested a different approach yesterday that didn't require a change to the nodes DNS resolver. It worked similar to the Etcd approach, where the pgbouncer node watches for a change in the consul service state for postgresql, and executes a command when it does. This happens using the consul HTTP API
I know this isn't going to change decisions already made but I just want to add my 2 cents.
We've been running consul-agent for a year at Gitter and we never had any issues with it. It helps us running a Nomad cluster and create and configure a canary environment with a single git push.
I wouldn't think that a Hashicorp product is immature simply by looking at the version < 1.0. They set an extremely high bar to call a product stable and to date there is only one that met the requirements (see "A HashiCorp 1.0" in https://www.hashicorp.com/blog/packer-1-0/).
Incidentally, we're also relying on Terraform 0.9 for our infrastructure at GitLab.com and it helped us go from 2 days to provision a single server to a maximum of 30 minutes.
Anyone can release a 1.x version: it's just a number after all. It's how a product reaches the milestone that makes the difference.
Per a conversation with @marin and @joshlambert, I did some more digging around Etcd and Consul so we could have some more data points. We wanted to compare the upgrade process, the backup process, as well as any resource usage.
Both support rolling/zero downtime upgrades. The process is the same on both:
Stop the process
Replace the binary
Start the process
This is done one node at a time.
While neither yet has introduced a breaking change that required a different process, Consul has documented the process they expect that would entail if necessary.
Consul has a lower per node requirement than Etcd, but does require more nodes. Still, per the consul docs, for a reliable "but relatively slow" consul cluster, they recommend the equivalent of an AWS t2.micro instance, or 1 cpu and 1 G of ram per node.
For a small cluster, etcd recommends 2 cpus and 8 G of ram per node.
For both products, memory usage is based on the working set of data. That should be small for what we need to do. If users want to use the bundled software for other uses, it would be up to them to ensure the node is properly sized.
yes we took a look at that as well, however pacemaker/corosync are very (very very) hard to package, making them a problematic solution for gitlab (the package).
Also, while i love corosync, it relies heavily on network stability, so it doesnt play well with our cloud based infrastructure.
On failover, Repmgrd would update the required key in a key/value store. This change would be picked up by Pgbouncer which would then propagate the change to the rest of Pgbouncer processes. That would then allow the traffic to be sent to the new primary.
Comparison
Complexity
Performance
Configuration
Features
Resources
Size
License
Etcd
Only for key-value store
Highly performant as key/value store
Requires 2 ports to be opened (Client requests and peer comms)
Only used as key/value store
For our use case, memory req. should be minimal
9ish MB
Apache 2.0
Consul
Key/value store is one of the features. Product with more features
Key/value store wise, less performant
Requires 4 ports (API, client, peer internal, peer external comms). For additional features, more ports
Health checks, Service discovery
For our use case, memory req. should be minimal
9ish MB
MPL 2.0
Pros vs Cons
For our current use case, PG HA failover key/value store: I would choose etcd. The fact that it requires 2 ports and that it is simpler to configure are the important factors. High performance does not matter for our use case, we don't expect a lot of traffic.
Future proofing: I would choose Consul. It has way more interesting features that could ultimately simplify required configuration. For example, it could unblock https://gitlab.com/gitlab-org/gitlab-ee/issues/2042 and allow us to build an EE feature which would allow the application to also read from read-only secondaries. We could also think about how to expend it further to use for secrets management between nodes.
Maintainability: I would choose Consul *. Production team already has this in the infrastructure and there is or will be knowledge in the company whose expertise we could re-use. * Here comes a but: The fact that production team is using Consul in the infra. does not bring value to the PG HA solution because production is using this independently and will use this independently from the package. However, there is something to be said about in house expertise.
Maturity: I would choose Etcd. Not that I think Consul is not stable piece of software, but for what it does Etcd has more kilometers behind it.
From all the data at our disposal at this point of discussion, I think we should go with Consul. The maintainability and additional options it opens for future gives more value and for our use case, it will do the same etcd can do at the moment.
@stanhu@ibaum Let me know what you think about this.
In order to get consul up and running I need to start by automating the following
Deploying consul cluster
I'll start by working similar to repmgrd. After install, a user would need to run a join command to get the nodes to cluster together. They just need to know the name of one node in the cluster. I think I prefer this to embedding node names in gitlab.rb. It should be safe to assume nodes will be added and removed over time. So forcing users to maintain the attribute list seems the wrong way.
Deploying consul agent
Initially it needs to deploy to PostgreSQL and Pgbouncer nodes
Add postgresql service, with status check to PostgreSQL nodes
Add service to watch and react to postgresql service updates on pgbouncer nodes
Current known unknowns:
I don't think we need a gitlab-ctl consul subcommand. The command line application is smart enough to talk to the local agent, and get its configuration from there. So we don't need to have any GitLab specific command line options. For workflow consistency, we might consider adding the subcommand anyway, and just blindly passing the arguments on to the consul command.
Initial release is out, and should be considered beta. I've added several issues as related issues to this ticket. These are all bug fixes or enhancements.