Flaky test detection, reporting, prevention, and minimization
Description
Flaky tests are a huge problem in many teams' CI/CD pipelines. Sometimes flaky tests are avoidable, sometimes they aren't. Even when they are, making them stable can be hard and sometimes prohibitive. We should make it easy to detect flaky tests, report on them, and possibly work around their flakiness.
Note that tests may fail for many reasons, from unexpected state to network outages. Sometimes retrying a test right away is still going to fail because of an external incident. An exponential backoff on retries might be helpful here.
Proposal
- Leverage JUnit as standard output from all test runners
- Detect patterns that indicate flakiness such as:
- Tests that fail, but then succeed on retries
- Tests that fail a high percentage of times
- Report the flakiest tests (so that someone can work on fixing them) (https://gitlab.com/gitlab-org/gitlab-ee/issues/3673)
- Flag flaky tests for automatic retries
- Block MRs that introduce new flaky tests
Links / references
Documentation blurb
Overview
What is it? Why should someone use this feature? What is the underlying (business) problem? How do you use this feature?
Use cases
Who is this for? Provide one or more use cases.
Feature checklist
Make sure these are completed before closing the issue, with a link to the relevant commit.
-
Feature assurance -
Documentation -
Added to features.yml