Fix flaky CI tests by chatting with your IDE
Senior Software Engineer

Flaky tests are a serious productivity problem. When tests sometimes pass and sometimes fail without code changes, they undermine trust in your CI pipeline and drain time from engineering teams. Debugging them often turns into a slow process of chasing logs, rerunning builds, and trying to guess what went wrong.
This post shows how to quickly detect and fix flaky tests directly in your IDE by chatting with an AI assistant. Using the find_flaky_tests
tool from the CircleCI MCP server, your code assistant can pull historical test data, surface instability patterns, and suggest fixes in context. That way, you can resolve flakiness faster without jumping between tools or digging through CI logs.
You can find this and additional usage examples in the CircleCI MCP Cookbook.
Prerequisites
To follow along, you’ll need:
- A CircleCI account
- A GitHub account
- Node.js 18 or higher installed
- An IDE that supports MCP integration (such as Cursor, Windsurf, or VS Code)
This setup lets your AI assistant interact directly with your source code and CircleCI test results.
Prepare your environment
Before your assistant can access your CircleCI data, you’ll need to authorize it with an API token.
Go to your CircleCI User Settings and create a new personal API token. Be sure to copy and store it somewhere safe. You’ll need it during setup.
Next, you’ll need to configure your IDE to use the MCP server using the API token you just created.
This step links your assistant to your CircleCI account and gives it permission to access the test history and logs needed to analyze flaky runs.
Step 1: Set up the project
Start by creating a new GitHub repository using the contents of the flaky tests example project from our MCP cookbook.
The test included in this repo is intentionally flaky. It fails randomly about half the time, making it a good candidate for detection and debugging.
Once the repo is created and the code is pushed, you’re ready to connect it to CI. The example project also includes a CircleCI configuration file that will automatically run the test suite on each build.
Step 2: Generate a flakiness signal in CircleCI
In CircleCI, create a new project called find-flaky-tests
and link it to your GitHub repository.
At this point, each push to your repo will trigger a build that runs the flaky test.
Run the build several times—five or six runs should be enough. You can do this by pushing minor commits or using the “Rerun workflow” option in CircleCI.
These runs will give you a useful pattern of passing and failing results. CircleCI’s Test Insights will start flagging the flaky behavior based on this data.
Step 3: Query for flaky tests
Inside your IDE, open chat in agent mode. Ask the agent to find flaky tests
and provide the CircleCI project URL (for example https://app.circleci.com/pipelines/vcs/org/repo
).
The assistant calls find_flaky_tests
, connects to your CircleCI project, and returns detailed information about the flaky test. It identifies the failing test case, pinpoints the file and line number, and highlights the exact assertion that’s behaving inconsistently. It also searches your codebase to bring the test implementation into view so you’re not switching tabs or digging through logs.
Step 4: Ask for a fix
Now that the flaky test has been identified, ask the assistant to fix the flaky test
.
It will analyze the test code and suggest a fix based on common causes like async timing issues, race conditions, or shared state. You can review the suggested code and apply it directly, or tweak it to fit your style.
Step 5: Push and rebuild
Commit your changes and push to GitHub. This will kick off a new CircleCI build. If your fix worked, the previously flaky test should now pass consistently.
You can trigger a few more runs to confirm stability. Once the test stops failing intermittently, it will no longer be flagged by CircleCI’s Test Insights.
If you want to check the pipeline status without leaving your editor, just ask the assistant what’s the status of the latest pipeline for this project?
It will run the get_latest_pipeline_status
tool behind the scenes, which pulls details from your most recent pipeline and returns a summary that lets you confirm everything is green.
You can also ask the assistant to find flaky tests in your project again. This will re-analyze your project’s test history and, assuming enough pipeline runs have passed for CircleCI to no longer consider the test flaky, it will confirm whether the issue has been cleared up and highlight any lingering issues or newly flaky tests.
Conclusion
Nothing derails team momentum more than flaky tests. Combining CircleCI’s automatic flaky test detection with an AI assistant in your IDE helps you go from failure to fix without changing context. The assistant has access to your code and test history, and it can highlight unstable tests and recommend targeted changes right where you work.
This approach keeps the feedback loop short and helps you maintain a more reliable test suite over time.
To learn more about how the CircleCI MCP server works with AI coding assistants to help debug builds, analyze tests, and surface CI issues, check out the documentation and project repository. You can sign up for a free CircleCI account and start exploring it in your own projects.