Fix flaky tests - Private beta

Language Icon 8 hours ago · 18 min read
Cloud
Contribute Go to Code
Chunk by CircleCI is currently in private beta. If you would like to join the private beta, sign up here for the waiting list. There are no extra costs during beta, Chunk uses CircleCI credits and your AI model provider token. Chunk tasks will be a paid feature after the beta.

Use Chunk by CircleCI to automatically identify and present resolutions to flaky tests in your CI/CD pipelines.

Chunk provides automated capabilities to identify and resolve common issues in your CI/CD pipelines. Chunk can automatically detect flaky tests and generate fixes to help you reduce the time spent debugging intermittent failures.

Chunk by CircleCI is an AI agent that you can choose to set up in your organization to help with CI/CD related tasks.

Introduction

Flaky tests are tests that pass and fail inconsistently. Flaky tests create uncertainty about code quality and slow down development workflows. Chunk can help address the problem of flaky tests by using artificial intelligence to analyze test patterns, identify root causes of flakiness, and propose validated solutions.

Chunk integrates with your existing CircleCI workflows and GitHub repositories. When configured, Chunk tasks run automatically on a schedule you define to monitor your test suite for signs of flakiness. When issues are detected, Chunk goes through the following steps:

  • Generate potential fixes.

  • Validate fixes through multiple test runs in an isolated environment.

  • Create pull requests with recommended changes after successful validation.

flowchart TD Start([Flaky test detected]) --> Generate[Generate potential fixes] Generate --> Validate[Validate fixes through
multiple test runs in
isolated environment] Validate --> CreatePR[Create pull requests with
recommended changes after
successful validation] CreatePR --> End([Complete])
Figure 1. Simple flowchart showing fix generation, validation, generate PR process

Set up Chunk and assign a task

To get started with automating flaky test fixes, you need to fulfill the following prerequisites and complete several setup steps to get Chunk set up in your organization. You can then assign tasks for Chunk to run.

Prerequisites

  • An API key from either Anthropic or OpenAI for Chunk to process and generate fixes. Your source code is not stored nor used for training purposes by CircleCI. If you are using OpenAI you should also check the following:

  • Ensure your CircleCI jobs store test results using the store_test_results step. Read more about this step in the configuration reference.

  • Ensure you have the CircleCI GitHub app installed in your GitHub organization. Check Organization Settings  VCS Connections, where you can see if you have the App already installed, or select Install GitHub App. Chunk needs the GitHub App to be installed to be able to recommend fixes and open pull requests.

  • Make sure you are following the projects you want Chunk to fix. CircleCI identifies flaky tests in your CI/CD pipelines on the Tests tab for workflows in the Insights dashboard (Insights  Select project  Select workflow  Tests).

Setup

Once you have the prerequisites, you can set up Chunk by following these steps:

  1. In the CircleCI web app, select your organization and then select Chunk Tasks from the sidebar.

  2. Select Get started and then Continue when prompted.

  3. You should see a passed icon to indicate you already have the GitHub App installed for your organization. If not, use the Install CircleCI GitHub App button to install it now.

  4. Select your AI Model provider (Anthropic or OpenAI).

  5. Enter your API key for your chosen model provider.

  6. Select Next to complete the setup.

Chunk setup modal

Assign a task

Once you have Chunk set up for your organization you can start assigning tasks.

To set up a "Fix flaky tests" task follow these steps:

  1. Select the project you want to assign the task to.

  2. Choose a run frequency (daily, weekly, monthly).

  3. Choose the number of tests you want Chunk to try to fix per run, between one and three.

  4. Choose a maximum for the number of solutions you want Chunk to try for each test, between one and three.

  5. Choose a number of validation runs to allow per test between one and 20.

  6. Choose a maximum number of concurrent open PRs for flaky test fixes. You can choose between one and 20, or the default, "Unlimited".

  7. The "Chunk environment setup" section guides you to set up a cci-agent-setup.yml file if you would like to control the environment in which Chunk runs your tests. For more information see Chunk environment setup. This step is optional.

  8. The "Post-run commands" section allows you to add commands for Chunk to run after each test run. This step is optional.

  9. Select Start task to complete the setup.

When you select Start task Chunk starts running immediately and follows the schedule you set up.

Chunk assign task modal

Chunk environment setup

To improve verification success, create an "agent environment" CircleCI YAML file. Copy the environment setup parts of your existing CircleCI configuration into a dedicated file for Chunk.

  • Name the file cci-agent-setup.yml and save it to your .circleci directory on your default branch.

  • cci-agent-setup.yml needs to include a single workflow (the name of the workflow can be anything you want) with a single job named cci-agent-setup. The cci-agent-setup job needs to set up your environment for Chunk to use. You do not need to include any steps to run tests, this is purely for environment setup.

    Example config file for cci-agent-setup.yml
    version: 2.1
    workflows:
      main:
        jobs:
          - cci-agent-setup
    jobs:
      cci-agent-setup:
        docker:
        - image: cimg/python:3.12
        - image: cimg/postgres:15.3
        steps:
          - checkout
          - run:
              name: Hello World
              command: |
                echo "Hello, World!"
          # insert more environment setup here

Chunk supports all standard CircleCI configuration options. This includes executors, resource classes, caching, contexts, environment variables, service containers, orbs, and everything else you would use in a standard CircleCI pipeline. If it works in your .circleci/config.yml, it works in cci-agent-setup.yml. For a complete reference of available configuration options, see the CircleCI Configuration Reference.

Example cci-agent-setup.yml files

  • Python

  • Caching & contexts

  • Multiple services

  • Resource classes & machine

version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup
jobs:
  cci-agent-setup:
    docker:
      - image: cimg/python:3.12
      - image: cimg/postgres:15.3
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: |
             pip install -r requirements.txt
version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup:
          context:
            - my-team-context  # Includes any secrets/env vars from this context
jobs:
  cci-agent-setup:
    docker:
      - image: cimg/node:18.0
    steps:
      - checkout
      - restore_cache:
          keys:
            - v1-dependencies-{{ checksum "package-lock.json" }}
      - run:
          name: Install dependencies
          command: npm install
      - save_cache:
          paths:
            - node_modules
          key: v1-dependencies-{{ checksum "package-lock.json" }}
version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup
jobs:
  cci-agent-setup:
    docker:
      - image: cimg/ruby:3.2
      - image: cimg/postgres:15.3
        environment:
          POSTGRES_USER: circleci
          POSTGRES_DB: test_db
      - image: redis:7.0
    steps:
      - checkout
      - run:
          name: Wait for DB
          command: dockerize -wait tcp://localhost:5432 -timeout 1m
      - run:
          name: Install dependencies
          command: bundle install
      - run:
          name: Setup database
          command: bundle exec rake db:setup
version: 2.1
workflows:
  cci-agent-setup:
    jobs:
      - cci-agent-setup
jobs:
  cci-agent-setup:
    machine:
      image: ubuntu-2204:2024.01.2
    resource_class: large
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: |
            sudo apt-get update
            sudo apt-get install -y build-essential

Environment variables and contexts

Project environment variables

Chunk automatically has access to any environment variables you have configured at the project level in CircleCI. You do not need to recreate or reference these, they are already available.

Contexts

If you are using CircleCI contexts to manage secrets or environment variables, you must include the context in your cci-agent-setup job (as shown in the caching example above). Chunk will have access to all variables from that context, you do not need to manually recreate them.

Testing your environment setup

To build and iterate on Chunk’s environment follow these steps:

  1. Navigate to Organization Settings  Chunk Tasks

  2. Identify your desired agent task.

  3. Select the ellipsis icon (ellipsis icon) and select Chunk Environment.

This page lets you run the contents of your cci-agent-setup.yml file on a specific branch and immediately see the results from those ad-hoc tasks. Use the Custom button to submit a task to Chunk and see the results.

Merge the cci-agent-setup.yml file to your default branch when the results on the environment setup page are satisfactory.

Additional guidance for Chunk

To improve Chunk’s ability to run tests and produce fixes that are aligned with stylistic/architectural preferences, you can include a markdown file (claude.md or agents.md) in the root of your repository with instructions for running tests. Chunk should pick this up automatically.

How Chunk by CircleCI works

Chunk operates through an automated analysis and remediation process that runs independently of your regular CI/CD workflows.

Test analysis and detection

Chunk continuously monitors test results stored in CircleCI to identify patterns of flakiness. It analyzes historical test data to distinguish between genuine failures caused by code issues and intermittent failures that indicate flaky behavior. Tests are flagged as flaky when they show inconsistent pass/fail patterns across multiple runs with the same code.

The detection process considers factors such as failure frequency, timing patterns, and error message consistency. This helps Chunk focus on tests that genuinely exhibit flaky behavior rather than tests that fail consistently due to code problems.

Solution generation

When a flaky test is identified, Chunk generates potential solutions based on common flaky test patterns and best practices. Chunk can create multiple solution approaches for each test, allowing it to try different fixes if the first attempt does not resolve the issue.

Solutions may include adding explicit waits, improving element selectors, handling race conditions, or stabilizing test data setup. Chunk tailors its recommendations to the specific failure patterns observed in your test.

Validation process

Before proposing any changes, Chunk validates potential solutions through multiple test runs in an isolated environment. This validation process ensures that proposed fixes actually resolve the flakiness without breaking existing functionality. Chunk runs the modified test multiple times to confirm consistent passing behavior.

Pull request creation

When Chunk has created a solution, it automatically creates a pull request in your GitHub repository. Each pull request includes detailed information about the changes made and the reasoning behind them. Pull requests will also include details of the validation process and the outcome of validation tests. If validation was not successful, this will be explicitly stated in the pull request to alert you to the need for manual validation.

Pull requests contain code diffs showing what changes Chunk recommends, along with logs that explain Chunk’s analysis and decision-making process. This transparency allows your team to understand and review the proposed fixes before merging.

The Chunk tasks dashboard

Once you have set up some Chunk tasks, you can view an activity timeline on the Chunk tasks dashboard.

Chunk tasks dashboard

Once a fix is available the row is marked as "PR opened". Select a row to view the task overview. This includes the following information:

  • Summary of the fix

  • Root cause of the flakiness

  • Details of the proposed fix

  • Details of the level of verification achieved

You also get a code diff of the proposed fix along with logs of the decision process presented as a conversation between "User" (Chunk) and "Assistant" (AI model provider). The diff and logs are designed to help you understand Chunk’s reasoning and analysis process.

Flaky test fix configuration options

The following table shows the configuration options available when setting up Chunk:

Table 1. Chunk configuration options
Setting Description Default

Run frequency

How often Chunk analyzes and fixes flaky tests

  • Daily (Sunday through Thursday at 22:00 UTC )

  • Weekly every Sunday at 22:00 UTC (default)

  • Monthly on the first day of the month at 22:00 UTC

Maximum tests to fix per run

Limits the number of tests Chunk will attempt to fix in a single execution.

1, 2, 3 (default)

Number of solutions to try per test

How many different fix approaches Chunk will generate for each flaky test.

1 (default), 2, 3

Number of validation runs per test

How many times Chunk runs a test to validate that a fix works consistently.

1-20. 10 is the default.

Maximum concurrent open PRs

Limits the number of pull requests Chunk can have open at one time.

1-20 or "Unlimited" (default).

Limitations

It is not possible to edit the Chunk task configurations. You cannot directly edit setup scripts or post-run commands once a Chunk task is created. To modify these settings, you must delete the existing Chunk task and create a new one.

Troubleshooting

Unable to run verification tests

Chunk runs in a Linux VM with basic software installed by default. To verify that a proposed fix resolves flakiness, it re-runs the affected test several times. To do this, Chunk may install additional software needed to set up the test environment, using clues from your CircleCI configuration file to determine how to run the tests.

View attempts in the CircleCI web app as follows:

  1. Open the Chunk task from the timeline.

  2. Select Task logs.

  3. Select the Expand All option, then search for attempt. This will take you to the section where Chunk is trying to run the tests.

Consider setting up a cci-agent-setup.yml file to control the environment in which Chunk runs your tests. For more information see Chunk environment setup.

Also consider including a markdown file, named claude.md or agents.md at the root of your repository with instructions for running tests. Chunk should pick this up automatically.

Invalid OpenAI modal specified

If you get the following error:

Invalid OpenAI model specified. Please check the model name and ensure it is available for your account.

You will need to make sure your organization has GPT-5 access. To verify this in OpenAI Platform, follow these steps:

  1. Switch to the project you want to check in the top left dropdown.

  2. Go to Settings  Limits in the left-hand menu. This page shows the models and rate limits for your project. gpt-5 will be listed if you have access.

I cannot get my OpenAI organization verified

If organization verification is not possible, you can bypass this requirement by adding an environment variable to your circleci-agents context, as follows:

  1. In the CircleCI web app, go to Organization Settings  Contexts.

  2. Use the search to find the circleci-agents context. Select it by name to open configuration options.

  3. Scroll down to the "Environment variables" section.

  4. Select Add environment variable to enter the variable name and value.

    • Under "Environment variable name", enter CCI_AGENT_OPENAI_MODEL.

    • Under "Value", enter gtp-5-nano.

Verification required error

If you get the following error inside a Chunk task, this indicates that your Open AI organization verification is pending.

OpenAI organization verification required. Please verify your organization at https://platform.openai.com/settings/organization/generaland see our community forum for more debugging help.

To fix this issues, head to OpenAI Platform, navigate to General  Organization Settings and select Verify Organization. Then follow the steps to get your organization verified.

Action required error

If you get the following error inside a Chunk task, this indicates that your Open AI organization verification is pending.

Action required - agent execution error
The agent ran into an error while executing this task. See our community forum for how to solve this error.

Contact sebastian@circleci.com for assistance.

Frequently asked questions

Does CircleCI use my data to train AI models?

No, CircleCI does not store your source code or use it for training purposes. Chunk processes your code temporarily to generate fixes but does not retain or share this information with model providers for training.

How long are Chunk’s logs stored?

Chunks logs are stored by CircleCI for 90 days. 90 days is a fixed retention period that applies to all organizations, regardless of your plan’s standard data retention policy. After 90 days, logs are automatically deleted to keep your workspace at optimal performance.