Updated Gitaly OKRs
Merge request reports
Activity
128 128 * Edge: Make GitLab QA test backup/restore, LDAP, Container Registry, and Mattermost 129 129 * CI/CD: Make runners work reliably and in a cost-effective way 130 130 * VP Scaling: [Lower latency](https://gitlab.com/gitlab-com/infrastructure/issues/947). [99% of user requests < 1 second](https://performance.gitlab.net/dashboard/db/transaction-overview?panelId=2&fullscreen&orgId=1) 131 * Gitaly: Gitaly controllers active on file-servers 132 * Gitaly: Roll out Gitaly features. 25 [controllers (total) in acceptance testing](https://gitlab.com/gitlab-org/gitaly/blob/master/README.md#current-features) 133 * Gitaly: Reduce “idea to production” time of migrations. 80% of all migrations started in Q3 reach "Acceptance Testing" in less than 45 days. 131 * Gitaly: Gitaly service active on file-servers 128 128 * Edge: Make GitLab QA test backup/restore, LDAP, Container Registry, and Mattermost 129 129 * CI/CD: Make runners work reliably and in a cost-effective way 130 130 * VP Scaling: [Lower latency](https://gitlab.com/gitlab-com/infrastructure/issues/947). [99% of user requests < 1 second](https://performance.gitlab.net/dashboard/db/transaction-overview?panelId=2&fullscreen&orgId=1) 131 * Gitaly: Gitaly controllers active on file-servers 132 * Gitaly: Roll out Gitaly features. 25 [controllers (total) in acceptance testing](https://gitlab.com/gitlab-org/gitaly/blob/master/README.md#current-features) 133 * Gitaly: Reduce “idea to production” time of migrations. 80% of all migrations started in Q3 reach "Acceptance Testing" in less than 45 days. 131 * Gitaly: Gitaly service active on file-servers 132 * Gitaly: Roll out Gitaly migrations. 24 additional endpoints migrated to Gitaly and [in acceptance testing](https://gitlab.com/gitlab-org/gitaly/blob/master/README.md#current-features) features
➞migrations
. We will work on other features (for example logging) but our goal is to deliver a large number of migrations.Also, the Gitaly team's focus is moving from controllers to endpoints.
Why?
- Controllers can share endpoints, but the effort is spent building endpoints, so there is not a one-to-one relationship between controllers and effort, whereas there is between endpoint and effort.
- Our feature toggles, dashboards and alerting are all built around endpoints, not controllers.
So far we have built 15 endpoints, so building another 25 is ambitious. However:
- Our process is now working well
- Our tech stack is maturing, meaning we can focus on migrations instead of setup features (eg logging)
- Our understanding of new tech, like grpc, has improved
However, 25 endpoints is just slightly more than one endpoint per iteration per developer (although the actual effort will be parallelised) so I think we can do it.
Oops! Yes. I've fixed this. I used a back-of-the-envelope calculation of 4 developers
x
6 iterationsx
1 endpoint per developer per iteration=
24 endpoints (that's not how development works, but good-enough)Updated to 25 again.
128 128 * Edge: Make GitLab QA test backup/restore, LDAP, Container Registry, and Mattermost 129 129 * CI/CD: Make runners work reliably and in a cost-effective way 130 130 * VP Scaling: [Lower latency](https://gitlab.com/gitlab-com/infrastructure/issues/947). [99% of user requests < 1 second](https://performance.gitlab.net/dashboard/db/transaction-overview?panelId=2&fullscreen&orgId=1) 131 * Gitaly: Gitaly controllers active on file-servers 132 * Gitaly: Roll out Gitaly features. 25 [controllers (total) in acceptance testing](https://gitlab.com/gitlab-org/gitaly/blob/master/README.md#current-features) 133 * Gitaly: Reduce “idea to production” time of migrations. 80% of all migrations started in Q3 reach "Acceptance Testing" in less than 45 days. 131 * Gitaly: Gitaly service active on file-servers 132 * Gitaly: Roll out Gitaly migrations. 24 additional endpoints migrated to Gitaly and [in acceptance testing](https://gitlab.com/gitlab-org/gitaly/blob/master/README.md#current-features) 133 * Gitaly: Reduce “idea to production” time of migrations. 80% of all migrations started in Q3 are enabled on GitLab.com within two GitLab releases. Why? Unfortunately, I predict that a large number of migrations will be not be completed on their first GitLab release.
What we're seeing is that there is a huge variety of edge-cases in production (for example, odd repos, corrupt repos, unusual requests). As we're finding, the first time we deliver a migration into production, we discover these edge-cases and need to need to perform rework in order to handle them.
However for many migrations, this means that we will need to wait for a second GitLab release before we can deliver our fixes, hence updating the goal to two releases.
(Obviously, while we're somewhat limited on our throughput by monthly release cycles, we can make up for this by increasing our bandwidth by maximising the number of migrations we're performing in parallel)
Aside: Once we have the ability to do true canary deploys, we'll be able to speed this up a great deal and change the goal to a number of days.
assigned to @ernstvn
mentioned in commit 90c0494a