Proposal for performance priority labels
Current description
To help clarify priority of issues, from the perspective of their impact on availability and performance of GitLab.com.
Closes https://gitlab.com/gitlab-com/infrastructure/issues/1943
Original description
I do not think this is a priority right now, but it stems from https://gitlab.com/gitlab-com/www-gitlab-com/merge_requests/5373/diffs which I broke into smaller merge requests.
Merge request reports
Activity
mentioned in merge request !5373 (closed)
mentioned in merge request !6129 (merged)
I don't really like this approach, I can imagine people assigning U1 to their issues, and having many issues with a high urgency, which then makes them all low urgency.
On top of that, this is a connascence of meaning. Which results in an indirection --> to understand what the labels mean I have to go read something else, instead of using a word that just defines the meaning in itself. As Jeff Atwood said in sometimes the best icon is a word, same goes for single letter labels.
Also, we already have a way of doing this with ~availability and ~critical, if something is likely to cause an outage then it affects availability. If something is really important, then it is critical.
Regarding toil and noise, sometimes it is not about how much time we save with an automation, but how much more safe are we by doing it. So some things can't really be defined that further down.
There is a part of craft in what we do, that requires understanding the impact of things, I rather have labels telling me a story without me having to go somewhere else to decipher them than have a lot of typifications that I can't really do anything with.
assigned to @ernstvn
As long as we do !6129 (merged), I agree that this approach does not help.
I do take issue with the statement
If something is really important, then it is critical.
because these are both highly subjective terms. The approach here attempts to reduce the subjectivity. Indeed, I would be happier if we delete the critical label altogether and replace it only with descriptive labels, and then use the prioritization and count of priority labels as a measure of criticality. But for now, we can close this MR.
mentioned in issue infrastructure#1943 (closed)
added 1425 commits
-
1d174cb4...b047fd1d - 1422 commits from branch
master
- 09b7d252 - Merge branch 'master' into evn-prio-labels
- 166199d6 - Add priority labels for performance
- 7753c83c - Slight edits after preview
Toggle commit list-
1d174cb4...b047fd1d - 1422 commits from branch
assigned to @ernstvn
@pcarranza @mydigitalself please review this updated merge request.
\cc @stanhu
assigned to @pcarranza
@ernstvn I lack the previous context in the conflicts.
Looks good to me, can you resolve conflicts and just merge in?
assigned to @ernstvn
I was wanting to refer to this today in a real-life instance... :) https://gitlab.com/gitlab-org/gitlab-ce/issues/33872#note_32991109
I realised what we're still missing here is a sense of timescale on the issue.
AP1
= drop everything and do it now?AP2
= schedule for next releaseAP3
= schedule in next 3-6 monthsWould that make sense or is
AP3
not aggressive enough?I think it's fair, but on that scale, I would rarely use
AP3
ever, I would useAP1
for little things, and most things would be inAP2
.Would that make sense for you? @mydigitalself
I realised what we're still missing here is a sense of timescale on the issue.
Ah no.... that's where your judgment comes in ;-) after all, that goes back to the business decision of how important it is to prevent the outage, or how important it is to improve the performance.
added 83 commits
-
7753c83c...3f96dfb1 - 82 commits from branch
master
- 2a14b518 - Fix merge conflict
-
7753c83c...3f96dfb1 - 82 commits from branch
enabled an automatic merge when the pipeline for 2a14b518 succeeds
- Resolved by Ernst van Nierop
mentioned in commit 5ef422a9
- Resolved by Ernst van Nierop
- Resolved by Ernst van Nierop
re the following "thread" @ernstvn @pcarranza
AP1
= drop everything and do it now?AP2
= schedule for next releaseAP3
= schedule in next 3-6 monthsAh no.... that's where your judgment comes in ;-) after all, that goes back to the business decision of how important it is to prevent the outage, or how important it is to improve the performance.
I think it's fair, but on that scale, I would rarely use
AP3
ever, I would useAP1
for little things, and most things would be inAP2
.I'm more than happy to use my judgement, but the levels should give me some good guidance and, to be fair, there's a lot of guidance in the
U
portion regarding the time horizon on when an outage may occur.Let's see how it goes over the next couple of releases. I'm somewhat concerned about everything getting
AP1
andAP2
; for instance, there are a lot of performance improvements that may deliver > 100ms ms improvement to a particular call (i.e.I1
) but is seldom used, say an.atom
resource or a less frequently used UI resource that gets only a few hundred requests a day but that still makes itU2
. By our matrix, that would get anAP1
and I really don't think that's a FIX IT NOW situation, and for me is probably a fix in 3-6 months type of horizon.The best thing to do is to stress test tags against this and see how the guidance is helping that decision making.
@mydigitalself I don't think that I would label something that would save us 100ms as
AP1
.To give you some idea:
-
AP1
-> circuit breaker for NFS mounts - please give me this ASAP because when one NFS server goes down it all crashes and creates a 20m outage. -
AP2
-> general object storage, I may tag artifacts in particular asAP1
simply because they are a problem on fire right now, but the whole thing can take a bit longer and I'm reasonable on that. -
AP3
-> improving an endpoint that is breaking the 1s latency SLA - I would like to see this done, but I can live with a slower page for now.
At least this is how I would be labeling things, which I think responds to common sense, granted that it usually is not that common though.
Does this reasoning make sense to you?
-
@pcarranza that's perfectly reasonable and pretty much fits how I think about the prioritisation, I was just pointing out that as per those definitions, we could overemphasize some things that should really be
AP3
.@mydigitalself Oh, no, that last thing I want is using all the
AP1
andAP2
bullets way too soon.mentioned in merge request !6580 (merged)