Skip to content
Snippets Groups Projects
Commit 6c038afe authored by Marin Jankovski's avatar Marin Jankovski
Browse files

Merge branch 'build_handbook_update' into 'master'

Build handbook: add infrastructure maintenance to team responsibilities.

Closes gitlab-org/omnibus-gitlab#2532

See merge request !6784
parents d8c319a4 0c0c4705
No related branches found
No related tags found
1 merge request!6784Build handbook: add infrastructure maintenance to team responsibilities.
Pipeline #
Loading
Loading
@@ -22,6 +22,7 @@ This means:
- Keep the installation and download pages up to date and attractive
- Address community questions in the [omnibus-gitlab issue tracker](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/)
and mentions in GitLab CE/EE repositories on issues with `Build` label
- Maintain infrastructure used by the team
 
### Projects
 
Loading
Loading
@@ -73,12 +74,6 @@ your feature/change might not be shipped within the release.
If you have any doubt whether your change will have an impact on the Build team,
don't hesitate to ping us in your issue and we'll gladly help.
 
TODO's
1. Installation page visuals https://gitlab.com/gitlab-com/www-gitlab-com/issues/1074
1. Lower the barrier of contributing to the Build team task
* First goal, allow simpler creation of packages: https://gitlab.com/gitlab-org/omnibus-gitlab/issues/2234
## Internal team training
 
Every Build team member is responsible for creating a training session for the
Loading
Loading
@@ -145,10 +140,11 @@ All work carried out by the Build team is public. Some exceptions apply:
it is expected for this work to become public.
* Work is done with a third party - Only when a third party requests that the work is not public.
* Work has financial implications - Unless financial details can be omitted from the work.
* Work has legal implications - Unless legal details can be omitted from the work.
 
If you are unsure whether something needs to remain private, check with the team lead.
 
### Working on dev.gitlab.org
## Working on dev.gitlab.org
 
Some of the team work is carried out on our development server at `dev.gitlab.org`.
[Infrastructure overview document](https://docs.gitlab.com/omnibus/release/README.html#infrastructure) lists the reasons.
Loading
Loading
@@ -159,3 +155,111 @@ Unless your work is related to the security, all other work is carried out in pr
 
The process documenting the steps necessary to update the GitLab images available on
the various cloud providers is detailed on our [Cloud Image Process page](https://about.gitlab.com/cloud-images/).
## Infrastructure
As part of the team tasks, team has responsibility towards the following nodes:
* dev.gitlab.org - GitLab CE running on this server needs to be operational
* omnibus-builder-runners-manager.gitlab.org - GitLab CI runner manager used for spawning build machines
* packages.gitlab.com - To be defined, this is a joint effort between Production and Build at this point as this
server is important part of the infrastructure.
### dev.gitlab.org
Every day at 1:30 UTC, a nightly build gets triggered on dev.gitlab.org. The cron trigger times are currently defined
at [the scheduled pipeline page on dev.gitlab.org](https://dev.gitlab.org/gitlab/omnibus-gitlab/pipeline_schedules).
Every day at 7:20 UTC, the nightly CE packages gets automatically deployed on dev.gitlab.org.
The cron task is currently defined in [dev.gitlab.org role](https://dev.gitlab.org/cookbooks/chef-repo/blob/fa6131d9d06299940a72c51cf60ea62c54fe3461/roles/dev-gitlab-org.json#L147-155).
#### Maintenance tasks
Requirements:
* Access to the node
* Depending on whether the task requires permanent changes to `/etc/gitlab/gitlab.rb`, access to the [Chef repo](https://dev.gitlab.org/cookbooks/chef-repo). If you do not have access to this repository, make sure
you create [an issue in Infrastructure issue tracker](https://gitlab.com/gitlab-com/infrastructure/issues/new?issue%5Bassignee_id%5D=&issue%5Bmilestone_id%5D=)
and label it `access request`.
Teams responsibility is to make sure that the GitLab instance on this server is operational.
The omnibus-gitlab package on this server is a stock package with required configuration to keep it operational.
Regular omnibus-gitlab commands can be used on this node.
If, for some reason, you need to apply a change to `/etc/gitlab/gitlab.rb`, this change
needs to be introduced in the [dev-gitlab-org role](https://dev.gitlab.org/cookbooks/chef-repo/blob/fa6131d9d06299940a72c51cf60ea62c54fe3461/roles/dev-gitlab-org.json).
If you do not have access to this repository, but you need to do a hot-patch or configuration
testing, the following steps can be performed:
* Stop chef-client on this node: `sudo service chef-client stop`.
* Make the necessary change to get the instance running again. If that requires change in gitlab.rb file,
change it manually and run reconfigure.
* Reach out to Production team to get help on getting your `gitlab.rb` configuration
change committed to the Chef server.
* After this has been applied, start the chef-client on the node: `sudo service chef-client start`
* Make sure that any change you did is noted in an issue! It is your responsibility to revert the
change on this node once the fix is in place in the package!
#### Improvements
* [Deploy notifications](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/2543)
### omnibus-builder-runners-manager.gitlab.org
GitLab CI runner manager is responsible for creating build machines for package builds.
This node configuration is managed by [cookbook-gitlab-runner](https://gitlab.com/gitlab-cookbooks/cookbook-wrapper-gitlab-runner).
Configuration values are stored in [the corresponding vault](https://dev.gitlab.org/cookbooks/chef-repo/blob/bb367e272662da9e9efdfd9adec13769e44a9bc3/roles/omnibus-builder-runners-manager.json#L18).
Currently, the version of GitLab CI runner is locked. We aim to be close to the current version of runner in order
to get the fixes that we need without getting into issues that could cause a failure. These failures could prevent
the release from going out so be careful with unnecessary changes on this node.
The runner manager is configured to use two docker machine drivers, "digitalocean" and "scaleway".
The former boots up machines in a Digital Ocean account for package and Docker image builds.
The latter boots up machines in a Scaleway account for Raspberry Pi (arm platform) builds.
#### Maintenance tasks
Requirements:
* Access to the node
* Access to a Chef Vault admin. At minimum, contact the Build Lead for help.
When the version of GitLab CI runner needs to be changed:
* To be performed by a Chef Vault admin
* In local clone of Chef Repo, they will need to run `bundle exec rake 'edit_role_secrets[omnibus-builder-runners-manager]'`. This command
will fetch the secrets from the chef vault and open up your text editor.
* Change the version of the package listed in the `gitlab-ci-multi-runner` section.
* After saving the change, there will be a lot of output which also includes deleting of some existing content in the chef vault. This is expected behaviour.
* Commit.
* To be performed by any team member:
* Login to the node and run, `sudo /root/runner_upgrade.sh` to perform the upgrade. This will stop the chef-client service, stop the runner and cleanup the machines,
run the chef-client to fetch the new version and finally, start gitlab runner again.
When you notice that the builds are pending on our dev.gitlab.org project, it is possible that
the number of failed machines is high. Failed machines prevent the runner manager from
starting up new machines and this can slow down or even block the release.
To resolve this:
* Login to the omnibus-builder-runners-manager.gitlab.org node
* Enter the root session: `sudo su`. This is required because `docker-machine` command will list running machines
for currently active user
* Run `docker-machine ls`. This will print out the list of machines that are either in `Running`, `Error` or have an empty state.
* To list only machines in `Error` state, you can use `/root/machines_operations.sh list-failing`
* To safely clean the machines with `Error` state, run `/root/machines_operations.sh remove-failing`
* If the machine has an empty state, you can always remove the machine manually or use `docker-machine ls | grep -v 'Running' | awk '{print $1}' | xargs docker-machine rm --force`.
This line will remove all machines that do not have `Running` state.
### packages.gitlab.com
At this moment, Build team is only the user of packages.gitlab.com. Release packages
are served to our users and customers from our CI on `dev.gitlab.org`.
The duties for this server are yet to be defined with the Production team.
Given that the package server is currently deployed on our own infrastructure,
from an omnibus type package, if Production requires help the team should do a
best effort to help trough any issues.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment