Improve infrastructure documentation
Currently we are lacking quite a bit of documentation, which is slowing down the work on infrastructure in general. We need to fix this situation by just adding the right documentation in the chef repo so things are trivial to maintain (initially) and then to automate.
Missing documentation so far (that I can think of):
- Setup documentation for checkmk - including how to add plugins with a sample.
-
Documentation for Elastic search
- Setup
- Enabling in production
- Indexing
- Troubleshooting
- Setup documentation for logstash, Kibana and ELK - General information on how these logs are being processed.
- General infrastructure sample with installation locations and where to get the logs when logstash is not working.
- Setup a general purpose VM in Azure
- Bootstrap a machine with chef (wherever it is running)
- Security setup (in general) including OAuth setup for all the services.
- Email documentation regarding how gmail actions work and how they are setup.
-
How to troubleshoot Redis and Postgresql cluster problems
- Where to get redis replication auth information
- RKHunter
- Document Grafana - https://gitlab.com/gitlab-com/operations/issues/167 @yorickpeterse
- Update HA documentation https://gitlab.com/gitlab-com/operations/issues/123
- Document how to create a custom package https://gitlab.com/gitlab-com/operations/issues/104
- Backup and backup restore documentation
Please add whatever else you think is tribal knowledge so we just fix it ASAP.
/cc @balameb