Intelligent CI/CD with CircleCI: test splitting

At CircleCI, our team works every day to build the best high-performance platform for engineering productivity. There are some features we’re known for: workflows, orbs, and first-class Docker support. But the CircleCI team is also working hard behind the scenes on optimizations to make software development faster, smarter, and more secure – in ways we realize our customers may not know about. To that end, we wanted to walk you through some of the lesser-known aspects of our platform. We’ll show you what they do, how they work, and give you tips to get CircleCI’s built-in intelligence working for you. Today’s post is on test splitting.

Intelligent test splitting for speed and profit

One of the most important contributors to developer efficiency is whether devs can get the information they need, as soon as possible – and test results might top that list. Any time a developer spends waiting for tests to run is time not spent writing the next piece of code, not to mention the cost of waiting around and losing context on what they’re working on. Fast feedback is everything.

Did you know that CircleCI can intelligently split tests to get you your test results faster?

Tests and test suites are highly variable in length. If you were to naively split tests across tasks, you could spend a lot of time waiting, since you need to wait until the last task of the group is complete to move on. For this reason, test splitting matters a lot.

CircleCI has built-in platform intelligence that can group your long and short tests together, based on your parallelism level, in order to minimize the total time of your test suite. We mine historical duration data of your runs to optimize for the shortest possible suite. And now, customers on our Performance plan can get even more out of test splitting because they aren’t limited by containers: they can get the benefit of running 10, 20, or even 40 test groupings concurrently without having to pay for access to 40 containers.

Let’s look at an example. Say you have one unit testing job and a series of integration tests. You’ve set up your workflow so that when that job finishes, the integration test jobs run at 10x parallelism. These involve some much longer-running tests because they involve steps like spinning up a browser, or talking to a database. If you naively split them to the same number of tests per container, then you can easily end up in a situation where you have lots of units of work that are complete (because they took a minute each, as an example), and one task that has to run for 10x as long, because you happen to have the longest tests grouped into one task. Instead, if you spread these by time, the whole unit of work now takes closer to 1 minute instead of 10 minutes.

With this one change, you’ve just 10x’d your test suite.

We can also share a real example from part of CircleCI’s test suite. Here are a collection of tests, grouped together in clusters of equal test numbers (but not equal test lengths), and randomly split across 10 containers. Each bar in this diagram represents a random grouping of tests:

realdata 1

And here are those same tests, re-grouped according to our historical run data to optimize for similar run time:

realdata 2

With the random spread, the fastest group of tests clocked in at 107 seconds, with the slowest at 219 seconds.

With smart test splitting enabled, the fastest grouping inched up to 116 seconds, but the slowest ballooned to 173 seconds.

This means overall we’ve reduced the spread from 184 seconds to 57 seconds.

And: CircleCI can do this automatically for you. When you enable test splitting, CircleCI will continually dynamically rebalance your test splits to minimize the time you spend waiting for results.

How to get the most out of test splitting

CircleCI automatically splits tests for you from within your test file. But there are some things within your control to optimize your test splitting:

Enable CircleCI’s Test Metadata collection to enable test splitting. Almost all testing frameworks allow you to output the results of your tests into a set of XML files or Cucumber JSON files. If your suite can produce these file types, we parse those results (ex: how many tests you ran, how many failures) and use the timing data to optimize future runs. In addition, you’ll also get additional features such as formatted failure data in the CircleCI UI. For an example of expanded insights from your test data, see this blog post.
Turn on splitting by timing data. All of our test splitting methods dynamically allocate your tests automatically across tasks. Splitting by timing is our most intelligent way to optimize the structure of the tests to get them through the fastest (read this blog post for an example of the Amio team setting this up using Gradle). Once you enable splitting by timing, CircleCI will look up the timing data from previous runs and predict how long it will take to run each test, then allocate the tests to minimize the total time required to get results.
Don’t write tests that are dependent on ordering. This is a best practice that extends beyond automatic test-splitting. Well-structured tests are independent of each other. Writing dependencies across tests is not a recommended practice, as it prevents you from taking advantage of parallelism or reordering.
Keep your test files small and organized. Instead, make useful groupings of tests. Keep them small so we can adjust them as needed to optimize your tests. Small units of work are key for organization as well as optimization. For example, with Rails or Ruby test files, CircleCI will reorder the files, but not the tests within those files. If everything is in one file, we can’t do much to optimize your suite. Also, a large single file of tests is probably indicative of code that is organized in an equally poor fashion. If you can’t split them, your code file is too big.
Optimize your tests for economy. At CircleCI, we do everything in our power to get you feedback and test results as quickly as possible. But keep in mind that you can reduce the overall time spent by pushing more of your test coverage into simpler tests, like unit tests, where they are cheaper and faster to run. Once you’ve done that, CircleCI can still parallelize those tests, and the overall reduction in time will pay dividends. For more background info on thinking about economical testing, see this blog post.

For the curious: why did we build this?

We built test timing originally in CircleCI 1.0. In this earlier version of CircleCI, all parallel tasks ran in lockstep. So when the testing phase started, it didn’t start until all previous parallel tasks of the previous step had finished. Then, the test phase would run. If you had 10 containers, it was possible that you were paying for 10 containers’ worth of time even though 9 were done, as long as the last one was still churning on tests.

As you can imagine, this wasn’t ideal. When we originally built in intelligent test splitting, it was to save customers money, and to optimize usage of that capacity they had already allocated. This also saved time, reducing the critical path to value delivery. Now with CircleCI 2.0, all tasks are independent, and therefore does away with the problem of paying for resources you’re not using. If one of your tasks finishes, we return that capacity to the VM pool so you don’t get charged. While this approach is definitely an improvement from the cost perspective, that long-running task would still be in the path of getting results to the developer. So test splitting continues to be a great approach to getting developers feedback as quickly as possible.

Solving long run times is about more than having fast computers, it’s about optimizing all resources in a way that gets you results as quickly as possible.

We’re proud of the ways that CircleCI’s smart automatic optimizations work harder to help you get the most out of your CI/CD pipeline. Look out for the other ways that CircleCI uses data and intelligence to drive more value out of your builds, coming soon in the rest of the series.

Intelligent CI/CD with CircleCI: test splitting

Intelligent test splitting for speed and profit

How to get the most out of test splitting

For the curious: why did we build this?

Similar posts you may enjoy

Pytest: Getting started with automated testing for Python

The testing pyramid: Strategic software testing for Agile teams

LLM hallucinations: How to detect and prevent them with CI