Geo: investigate alternative to geo_{primary|secondary}_role in gitlab.yaml

Currently we are using Gitlab.config.geo_primary_role['enabled'] and Gitlab.config.geo_secondary_role['enabled'] to overcome some issues we found previously in the following issues:

Related MRs and Issues:

But this new configuration will prevent anyone trying to setup HA as we currently rely on the roles configured in omnibus to activate the roles in gitlab.yml. This was not the intended behavior for Omnibus roles. They are supposed to be used as a shortcut to trigger existing configurations as a way to simplify common setups.

So right now this decision to use the Omnibus roles is blocking https://gitlab.com/gitlab-org/gitlab-ee/issues/2825 and we have here two alternatives:

We implement 'yet' another configuration flag in omnibus, use that to write to the gitlab.yml the roles and then add this new flag to the Omnibus roles, or we can try to remove the roles from gitlab.yml and simplify the code again.

Proposal

The problematic parts of the code are: initializers and the active record connection. For all the other pieces we can rely on the database and query it to see what is configured or not.

On the initializer side, we have sidekiq configuration enabling or disabling cronjobs.

On the ActiveRecord side we need to establish connection to the DR database when in a secondary nodes.

What we can do here is use the existence of the database_geo.yml file to determine wether we are going to override the connection for the DR models or not, so we don't need the geo_secondary_role for that.

We can use the nonexistence of the database_geo.yml file to decide wether we are going to try to update_clone_url for the primary node. So if we are in a machine that doesn't have this file, we can always try to update (it will not do anything if there are no configured nodes). We already use a rescue there to fail gracefully when database is unavailable etc for rake tasks so we are covered.

The last part and most unknown is sidekiq cronjobs configuration. Our current code requires a restart when you change from primary to secondary, or when an existing newly configured node is configures as secondary after the replication started and the server is running.

We can fix this by introducing a check in every non-geo entry to bypass the execution AND reconfigure the cronjobs. Same thing for the Geo cron jobs, in a non Geo environment, it will skip the job and reconfigure the cron.

What you all think?

cc @stanhu @dbalexandre @rspeicher

Edited Jul 04, 2017 by Gabriel Mazetto

Admin message

Admin message

Geo: investigate alternative to geo_{primary|secondary}_role in gitlab.yaml

Proposal