The main reason is because it has an incident tracking system that can come really handy to avoid sending oncall handoff emails as we could just have a simple source of truth for deploys, outages, incidents in general and even just infrastructure changes (there are private incidents)
I would love to have just one place to go to see what the status of the whole infrastructure is and what was going on while I was away.
We could really easily push private incidents from our deployment rake tasks.
I think this is a really good too. I like the way the tool looks from the demo, and I would really like to stop doing a handoff email if we can replace it with a viable alternative!
I like the idea of transparency with our user base, plus the fact that if
you're keeping up with it, it creates a timeline for your postmortem for
you (that again is transparent!)
I've set up a Cachet instance at https://cachet.gitlab.com using a self-signed certificate on a 2GB Digital Ocean droplet in the prod account. Obviously we can change the url and such when we are ready to go live with it. I've sent "team invites" to @pcarranza, @stanhu, @northrup, @yorickpeterse, and @jnijhof. I have no idea what an invite looks like or what kind of permissions it will give you when you log in.
I'm not convinced of this, I find the interface clunky and the history of
changes to an issue not what I would want to see... as in there IS no
history.
I don't know that it's not an option, I know when I look at what they
provide and what they charge, I get indignant and think we can do it
ourselves, but that's just me.
What I'm missing on Staytus is the ability to just have a stream of events to track what has happened. That said, I do see value in having something in a technology that every developer in the company can modify.
That said, we would need to resource it's development. @DouweM@stanhu could we have a developer assigned to add a feature to staytus so we can use that one instead of cachethq ?
@northrup@jnijhof could you define what would be needed to finish Staytus and have this stream of events in such a way that we can put it there and just forget about it? Remember the single source of truth idea I'm chasing here.
BTW, does it allow us to use OAuth so we don't need to create any account?
The feature set that I am looking for is a history of events and actions taken during an "incident". I would like to see information about what the incident was, what was done to troubleshoot it, what was done to resolve it, and what we're doing to monitor it. This history should persist in a reviewable state so that after an incident is resolved anyone can go back and see the full running history from start to finish for a 'post mortem' on the issue.
@stanhu@sytses if Staytus is not maintained and looks usable, there's the option of forking it and integrating it with GitLab Issues and shipping it along. That would be easiest way for us to ship such a solution.
Using it ourselves will also work as due diligence on the code and push us to modify it.
@connorshea Adam (Staytus creator) just responded to a bunch of PR's and open tickets, I don't think he's abandoned the project, I think he's just not focused on it. @ahanselka it's up and running over at https://159.203.19.250, next step is to wrap it with Chef.
John NorthrupChanged title: Consider replacing status.gitlab.com with {-https://cachethq.io/-} → Consider replacing status.gitlab.com with a different tool
Changed title: Consider replacing status.gitlab.com with {-https://cachethq.io/-} → Consider replacing status.gitlab.com with a different tool
There are actually people looking to the graphs which are now gone could we first play and make it better at a testing environment something like status-test.gitlab.com ?
My personal opinion and observation is we have a directive to make graphing
and performance data public, which we are doing through Grafana and
Prometheus. We also have a directive to be more transparent and provide
more concise status and updates about our services when we have issues and
problems. I don't see the two as being married. Let the graphing platform
do what it does and if people want stats, that's where they should look.
Now, we should have a link from the status page to the performance page,
but status should be up/down/degraded/maintenance information and concise
details on the who/what/when/where/why of the situation.
Shouldn't we fork https://github.com/adamcooke/staytus and create a project under gitlab-org? So issues can be created and linked back to the master project.
We already have a few requests:
@stanhu: Ability to provide hyperlinks in incident description to GitLab issue
@jnijhof Embedded graphs, maybe a plugin from Grafana/Prometheus
Updating everything in two places as suggested in https://twitter.com/gitlabstatus/status/788301988224233472 does not work. It is hard enough to update one source. I propose we make Twitter the source of information and just show that on our status page.
@stanhu I've restored the original status page, once DNS replicates (~5m) it'll be live globally. This all started with @pcarranza wanted a better tool and a quest for that tool - sounds like we're back at looking at what the right path is.