Add System Log Errors to QA Failure Issues
What does this MR do and why?
Adds a new feature to GitLab QA that allows the relate_failure_issue
script to add a section to newly created QA failure issues containing a summary of relevant errors from the GitLab application system logs.
If the logs exist as an artifact (such as in our containerized test runs), and the test failure contains a correlation ID, the script will search for logs containing this correlation id, create a summary with a subset of relevant fields, and add these summaries within the QA failure description on creation.
As a first iteration, this addition focuses on the following Rails logs:
api_json.log
application_json.log
exceptions_json.log
graphql_json.log
Relates to https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/1627
High Level Design Overview
finders
Responsible for searching for logs matching a given correlation ID
-
json_log_finder.rb
- the main factory base class inherited by the otherfinder
subclasses. Thefind
method here will return an array of log objects that contains a matching correlation ID. The object created will depend on the log type returned from the abstractnew_log
method, which is to be defined by each of the differentfinder
subclasses.- In the future, we can create a separate class when working on supporting logs that aren't in JSON format (ex: PostgreSQL, etc.)
log_types
These are our "models", and represent the logs themselves (such as Rails API, Rails Exception, etc.), including their data and corresponding "summary fields." The summary fields are what we use to extract a summary of the data we care about to include in the QA failure issues. This can help us to filter any potentially sensitive fields as well as limit extraneous information.
-
log.rb
- the main base class inherited by all other log types. It includes summary fields shared between all logs, and the summary method for generating the summary from the full set of data.
shared_fields.rb
- A module containing sets of shared fields that are shared by multiple logs, but not all.
system_logs_formatter.rb
- Responsible for formatting the markdown for all the log summaries to be added as a section in the QA failure issue description.
Screenshots or screen recordings
How to set up and validate locally
- Download a set of artifacts from an actual test failure in master which includes a correlation id, such as: https://gitlab.com/gitlab-org/gitlab/-/jobs/3960857494
- If you are having trouble finding one, feel free to reach out to me via Slack and I can send you a copy
- Unzip and add the folder to the base of your local
gitlab-qa
project in a new folder such asgitlab-qa-run-123
- In your local GDK, create a group called
test_failures_group
with a project calledtest_failures_project
- From within the
gitlab-qa
directory, run:
GITLAB_API_BASE='<gdk base url>/api/v4' GITLAB_QA_ACCESS_TOKEN=<your token here> CI_PROJECT_NAME='main' bundle exec exe/gitlab-qa-report --relate-failure-issue "gitlab-qa-run-*/**/.json" --include-system-log-errors "gitlab-qa-run-*/**/logs" --project 'test_failures_group/test_failures_project'
- In your GDK instance, log in and go to the
test_failure_project
's issues - Search for the issue that is created that contains the test failure with the correlation ID
- You should see a
System Logs
section at the bottom of the issue description, with sections forRails Application
andRails GraphQL
, that you can expand to see the various logs found by the correlation id - You can also check the other issues generated without a correlation id and verify the issue descriptions are created as expected, and contain no
System Logs
section - Feel free to experiment with other tests as well!
🔬
Testing in CI
-
I have tested these changes within package-and-test
👉 https://gitlab.com/gitlab-org/gitlab/-/pipelines/824577898- Example of a failure issue created with system logs: https://gitlab.com/gitlab-org/gitlab/-/issues/403739
- Example of a failure issue created without system logs (same as before): https://gitlab.com/gitlab-org/gitlab/-/issues/403774
-
I have tested these changes against Staging to verify no regressions occur when no system log artifacts are generated 👉 https://gitlab.com/gitlab-org/quality/staging/-/jobs/4053410936
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.