expired on Aug 22, 2016
- General infrastructure
Fix oncall schedule (@pcarranza)Mountpoint with CephFS in production.Some project stored in CephFS in production.- Initial prometheus with at least blackbox testing covered in chef.
Get backups under control - at least having a path forward and a full understanding.- Integration testing
- Performance
MR Diff latency under 7s- MR commits latency under 5s
- Improve performance of post received task
ssh access stable timing (in the new NFS mountpoint)
- We don't have a clean metric for the post received task, so the goal is incomplete.
We don't have a sane metric to expect out of the ssh access timing, so I'm good with not having a range from 2 seconds to 26, but make it stable at some point in the middle consistently.We added blackbox monitoring to get some sense out of it. - Integration testing is badly defined and is not actionable, we need to better define the goal here.
Unstarted Issues (open and unassigned)
Ongoing Issues (open and assigned)
Completed Issues (closed)
- Degraded Performance on 2016-08-12
- Postmortem for high DB load on August 6
- Expand LVM drive in file-storage1
- Create gitlab-monitor ruby library to gather all our monitoring scripts
- Cephfs - build automation to add capacity to the cluster
- Set up continuous queries for the events collection in InfluxDB
- Move all gitlab-com and gitlab-org repositories to Ceph
- Make Ceph the default shard for new repositories
- Investigate upgrading PostgreSQL using pg-logical
- Disable db migration on non-blessed workers
- Enable autoscaled runners for omnibus-gitlab
- Plan for upgrading to Ruby 2.3 on
- Investigate the time it takes to perform a regular PostgreSQL upgrade in staging
- Address Space Issues on LB Servers
- Where to put new GitHost servers?
- Migrate registry to S3
- During outage, establish an incident leader to avoid multiple people executing the same operations
- Missing high_availability['mountpoint'] entries in `/etc/gitlab/gitlab.rb`
- Slow SSH access
- Do (semi-)automatic profiling after each release
- Update our backup and restore documentation and test it