Skip to content

Add System Log Errors to QA Failure Issues

James Nutt requested to merge add-system-log-errors-to-qa-failure-issues into master

What does this MR do and why?

Adds a new feature to GitLab QA that allows the relate_failure_issue script to add a section to newly created QA failure issues containing a summary of relevant errors from the GitLab application system logs.

If the logs exist as an artifact (such as in our containerized test runs), and the test failure contains a correlation ID, the script will search for logs containing this correlation id, create a summary with a subset of relevant fields, and add these summaries within the QA failure description on creation.

As a first iteration, this addition focuses on the following Rails logs:

  1. api_json.log
  2. application_json.log
  3. exceptions_json.log
  4. graphql_json.log

Relates to https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/1627

Important: This MR should be merged before https://gitlab.com/gitlab-org/quality/pipeline-common/-/merge_requests/277

High Level Design Overview

finders

Responsible for searching for logs matching a given correlation ID

  • json_log_finder.rb- the main factory base class inherited by the other finder subclasses. The find method here will return an array of log objects that contains a matching correlation ID. The object created will depend on the log type returned from the abstract new_log method, which is to be defined by each of the different finder subclasses.
    • In the future, we can create a separate class when working on supporting logs that aren't in JSON format (ex: PostgreSQL, etc.)

log_types

These are our "models", and represent the logs themselves (such as Rails API, Rails Exception, etc.), including their data and corresponding "summary fields." The summary fields are what we use to extract a summary of the data we care about to include in the QA failure issues. This can help us to filter any potentially sensitive fields as well as limit extraneous information.

  • log.rb - the main base class inherited by all other log types. It includes summary fields shared between all logs, and the summary method for generating the summary from the full set of data.

shared_fields.rb

  • A module containing sets of shared fields that are shared by multiple logs, but not all.

system_logs_formatter.rb

  • Responsible for formatting the markdown for all the log summaries to be added as a section in the QA failure issue description.

Screenshots or screen recordings

system_log_errors_demo

How to set up and validate locally

  • Download a set of artifacts from an actual test failure in master which includes a correlation id, such as: https://gitlab.com/gitlab-org/gitlab/-/jobs/3960857494
    • If you are having trouble finding one, feel free to reach out to me via Slack and I can send you a copy
  • Unzip and add the folder to the base of your local gitlab-qa project in a new folder such as gitlab-qa-run-123
  • In your local GDK, create a group called test_failures_group with a project called test_failures_project
  • From within the gitlab-qa directory, run:
GITLAB_API_BASE='<gdk base url>/api/v4' GITLAB_QA_ACCESS_TOKEN=<your token here> CI_PROJECT_NAME='main' bundle exec exe/gitlab-qa-report --relate-failure-issue "gitlab-qa-run-*/**/.json" --include-system-log-errors "gitlab-qa-run-*/**/logs" --project 'test_failures_group/test_failures_project'
  • In your GDK instance, log in and go to the test_failure_project's issues
  • Search for the issue that is created that contains the test failure with the correlation ID
  • You should see a System Logs section at the bottom of the issue description, with sections for Rails Application and Rails GraphQL, that you can expand to see the various logs found by the correlation id
  • You can also check the other issues generated without a correlation id and verify the issue descriptions are created as expected, and contain no System Logs section
  • Feel free to experiment with other tests as well! 🔬

Testing in CI

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by James Nutt

Merge request reports

Loading