I suggest we separate the messages into INFO and ACTION sections for any action that user needs to take. There are users who do not care about why they are doing a certain thing, they want to just be able to copy paste.
That is good idea @ibaum, I like it. I don't think we should hide INFO but we should definitely see if we can print the ERROR at the very end. Any suggestions on how we could do this?
I agree, I don't think we should hide anything. But users might run gitlab-ctl reconfigure > /dev/null. We should stay unixy and send ERRORs to stderr, they have to go the extra mile if they want to ignore errors too.
I have been looking at this issue during last few days and following is a brain dump.
Chef error handlers can be definitely used to handle ERROR (which are usually exceptions raised during reconfigure or failures during db migrate and other post-installation stuff). Basically, they are exceptions with a big traceback.
INFO messages are what reconfigure currently produces. We don't intend to hide them as there are users who is interested in that information.
ACTION messages are something that we generate and display to the user - like "Hey, we are upgrading PostgreSQL in the next version and older versions will fail then. So you should probably upgrade it ASAP." and similar messages which are kinda information but require an action from user.
There can be two kinds of ACTION - one which doesn't interrupt the current process, like the upcoming PG upgrade notification. Other one interrupts the current process. This will come in handy during our pre-flight check, like if the domain doesn't resolve correctly we need to abort the process and ask the user to fix it.
Second type of ACTION messages can be again handled by Chef's error handler as we can raise exceptions in those cases. But I am unsure about the first type of ACTION messages. We can't raise an exception and interrupt the process there.
Questions that I am still trying to find answers to:
Error handlers can collect the exception and print them at the end. But they are still humongous stack traces with alien error messages like Undefined method blah for nil:Class which is almost of no use to end user. We need messages like "Incorrect setting in configuration file" or "Syntax error in configuration file" or "You don't have permission to change system parameters". How to do the mapping between exceptions and these messages?
How do we collect type-1 ACTION messages (the one that doesn't interrupt) and print at the end? Maybe use Chef::Log and write them to a file and parse that file at the end of reconfigure run?
Right now, we don't have much ACTION messages, so it isn't clear how to approach them. For PG upgrade, we anyway printed them at the end which required no further action. But once pre-flight check kicks in, that won't be the situation.
Even now, exceptions will abort the process and hence they aren't too far away to be scrolled. So moving them to end using Chef handlers doesn't do much help because they are almost at the end already. Generating meaningful messages is definitely a way to go.
I am not entirely convinced we need reconfigure's INFO messages being printed right away to STDOUT. I see them being logged to /var/log/gitlab/reconfigure/* so users can simply check those files. But @marin mentioned logging to those files isn't perfect so we can't rely on it now. Maybe we should fix that first and then we can probably remove the big wall of text. Maybe timestamp the log files so that they can be easily identified too. @marin Is there an issue about log files not being correct or what all data are missing from it?
@gitlab-build-team There may be some easy solutions that I missed/overlooked as my Chef experience isn't something I am proud of. If you have some suggestions for me to check out, or dig the docs or other products who are in similar situations, please share so I can look into them too.
Error handlers can collect the exception and print them at the end. ***
Right now, we don't have much ACTION messages, so it isn't clear how to approach them.
Can we do something like method_missing in libraries/gitlab.rb? Basically, for ACTION messages we would just raise with a known key and process it that way. All others would have to be sanitized (somehow) and shown to the user.
Other one interrupts the current process. This will come in handy during our pre-flight check, like if the domain doesn't resolve correctly we need to abort the process and ask the user to fix it.
I don't think an ACTION should interrupt reconfigure. I think we should only interrupt the process for an ERROR.
Error handlers can collect the exception and print them at the end. But they are still humongous stack traces with alien error messages like Undefined method blah for nil:Class which is almost of no use to end user. We need messages like "Incorrect setting in configuration file" or "Syntax error in configuration file" or "You don't have permission to change system parameters". How to do the mapping between exceptions and these messages?
Chef handlers are a bit more adaptable/complicated than that. They are similar to rescue statements for chef in that we can use them to intercept errors, then do whatever we want with them. And, they don't just come into play for errors. They have an extensive list of event types they can handle.
So, in my opinion we should:
Keep the INFO messages as is, just fix the logging issue you were referring to. If users don't want to see this, they can run gitlab-ctl reconfigure > /dev/null and know that the log file will still be there if they need it.
For ERROR, I agree we shouldn't print stacktraces to stdout or stderr. We should distill those as best we can to stdout and the reconfigure log, and log the stacktrace somewhere that they can refer to if they need more information.
ACTION should be printed out last, and should probably go to a log and stderr. The way I'm interpreting the intention for ACTIONs is they are non-optional for a working GitLab instance. It's not unreasonable to ignore stdout, but users shouldn't be ignoring stderr. They will, which is why we should log it as well, possibly to a dedicated log file.
We may also consider some sort of pending actions state file. Though this may be a solution in search of a problem.