Here are the current screenshots of the dashboard to manage Geo and DR:
What could we do to make them better? Let's use this issue to brainstorm. Once we'll get a better idea of what we want to accomplish, we'll create separate issues for them.
Proposal
Indicate the reason why a repository failed to sync
We only mention repositories synced status. Can we add % files replicated as well?
Should we allow to remove the primary node?
How does it look when we detect a health problem?
Is there a button to add a new node? How do nodes get detected?
Show date of last time synced (I have the feeling it would be reassuring)?
Show if a sync is currently in progress
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Indicate the reason why a repository failed to sync
Definitely.
We only mention repositories synced status. Can we add % files replicated as well?
I'm curious how the Primary knows the status of the secondary's sync, since it's the secondaries that have the tracking database. Do we have (or have plans to add) a method for the secondaries to send back a kind of progress ping?
Should we allow to remove the primary node?
To what end? Wouldn't that disable Geo entirely? Or would it randomly pick a secondary to become the new primary? How does that new primary notify the other secondaries?
How does it look when we detect a health problem?
One of those screenshots seems to indicate it: "Could not connect to Geo database". That said, that doesn't stand out at all, and doesn't even indicate that it's a problem. As we get further along with a working DR solution we'll likely need some UX and Frontend love here.
Is there a button to add a new node? How do nodes get detected?
Right now since so much of the process is manual I don't think it's feasible. This will depend on Stan's suggestions to automate a lot of this process first.
Show date of last time synced (I have the feeling it would be reassuring)?
To what end? Wouldn't that disable Geo entirely? Or would it randomly pick a secondary to become the new primary? How does that new primary notify the other secondaries?
Well, there is currently a button that lets us remove the primary node (thanks to this screenshot). I assume we change remove it from the UI, @dbalexandre
To what end? Wouldn't that disable Geo entirely? Or would it randomly pick a secondary to become the new primary? How does that new primary notify the other secondaries?
Well, there is currently a button that lets us remove the primary node (thanks to this screenshot). I assume we change remove it from the UI, @dbalexandre
@regisF If we remove this button from the UI we should add a button to allow an admin to edit the primary node in case that he entered some wrong information or it has been changed. Wdyt?
@dbalexandre alright, so the main use case intended for this button, is to actually designate a primary node?
If that's the case, we should instead have a button saying Make primary or something.
By the way, @rspeicher@stanhu what did you have in mind in case of actual disaster, to designate a secondary node as the primary node? We can't the UI in this case.
@stanhu shouldn't it be done at the CLI level? I mean, there won't be any GUI available, I guess.
No matter what, we need to keep track of the status of the replication (in %) at the secondary node level. Why?
Imagine the scenario: we have 2 secondary nodes (S1 and S2), and one primary (P). If S1 is at 50%, and S2 at 70% of the primary, and if primary crashes, users will need to designate a secondary node, and they should be able to pinpoint which node has the most up to date information and % replicated data.
I think we are kinda doing that at the moment, but are we really?