AI DevelopmentMar 5, 202617 min read

Regression Testing: What it is, why it matters, and how to automate it with CI/CD

Roger Winter

Content Marketing Manager

Regression testing is the practice of re-running existing tests after a code change to confirm that previously working functionality hasn’t broken. It answers a single question: did this change break something that used to work? In CI/CD pipelines, regression tests run automatically on every commit, giving teams immediate feedback before code reaches production.

This guide covers what regression testing is, how it differs from retesting and other testing types, and how to automate it in a CI/CD pipeline.

Why does regression testing matter?

Regression testing matters because code changes have side effects, and catching those side effects in continuous integration costs minutes instead of hours.

A developer adds a discount code field to the checkout page. The feature works and the build ships. Two days later, support tickets are piling up: existing customers can’t complete purchases at all. The problem isn’t in the discount code logic. It’s that the new code changed how the pricing module handles null values, and every order without a discount code now throws an unhandled exception.

That’s a regression. It didn’t surface in the new feature’s tests because the bug lives in the interaction between new code and old code. Codebases aren’t isolated systems, and changes in one module can surface as failures in another.

That bug got caught by customers. Now it costs incident response time and a hotfix deployment under pressure. A bug caught in CI costs a developer a few minutes and a one-line fix. The same bug caught in production costs an incident ticket, a war room, a hotfix branch, a re-deployment, and whatever customer trust was lost in between.

The Consortium for Information & Software Quality (CISQ) estimated in its 2022 report that poor software quality costs the US economy at least $2.41 trillion. Most of that isn’t dramatic outages. It’s the accumulated weight of bugs found too late and the technical debt left behind while fixing them.

Teams that skip regression testing don’t work faster. They move the debugging from a CI pipeline, where it costs minutes, to production, where it costs hours of incident coordination plus the cognitive overhead of re-engaging with code someone wrote last week instead of five minutes ago. This shift-left approach to testing, catching problems as early as possible, is a core DevOps practice.

What’s the difference between regression testing and retesting?

Retesting checks whether a specific bug fix worked. Regression testing checks whether that fix, or any other change, broke something else.

Retesting is narrow. A tester filed a bug: the checkout page miscalculates tax on international orders. A developer fixes it. Retesting means running the exact scenario from the bug report to confirm the fix works. Once it passes, the defect is closed.

Regression testing is broader. After that tax fix ships, a regression suite runs to verify that the fix didn’t introduce new problems elsewhere. Maybe the tax calculation change also affected how shipping costs are computed, or how order totals display on the confirmation page. Regression testing catches those side effects.

The two are sequential. A team retests the fix first, then runs regression tests to check for collateral damage. In a CI pipeline, both happen in the same run: the targeted test for the specific fix alongside the broader regression suite.

Attribute Retesting Regression testing
Purpose Verify a specific bug fix Verify changes didn’t break existing functionality
Scope Narrow: the defect and its fix Broad: previously working features across the codebase
Trigger A reported bug was fixed Any code change: feature, fix, refactor, dependency update
Automatable? Sometimes, depends on the defect Yes, and it should be
Typical timing After a specific fix, before closing the defect After every change, ideally on every commit

Types of regression testing

Once a team decides to run regression tests, the next question is how much of the suite to run. A one-line CSS fix doesn’t warrant the same test coverage as a database schema migration.

The right approach depends on what changed, how much risk the change carries, and how much time the team has before the code needs to ship. Most teams use a mix of the following types, layered across their continuous integration and continuous delivery pipeline. Mapped against the test pyramid, these types correspond roughly to different layers and scopes.

Corrective regression testing

No code has changed in the product itself. The team is validating that existing test cases still pass after an environment change, a dependency upgrade, or an infrastructure migration. Because the tests don’t need modification, corrective regression is fast to set up and run. The limitation is that it only covers code paths that the existing tests already touch.

Progressive regression testing

Existing tests no longer fully cover the modified behavior, so the team writes new or updated test cases. This is the normal situation when a developer ships a new feature or does a major refactor. It’s thorough, but it requires upfront effort to write those tests before the regression run means anything.

Selective regression testing

Rather than running the full suite, the team picks a subset based on impact analysis: which modules did this change touch, and which tests cover those modules? Selective regression is the pragmatic choice for time-constrained runs or PRs that affect isolated, well-bounded parts of the codebase. The risk is that if the impact analysis misses a dependency, the test run misses the bug.

Complete regression testing

Every test case in the suite runs. This is appropriate before major releases, after foundational changes like database schema updates or authentication rewrites, or after a large refactor that cuts across many modules. It gives the highest confidence but takes the longest. Complete regression is only practical per-commit if the suite is fast enough through parallel execution.

Unit regression testing

Unit tests scoped to a single module run after that module changes. This is typically the first gate in a CI pipeline because unit tests are cheap and fast. The trade-off is scope: unit regression won’t catch bugs that only appear when modules interact.

Visual regression testing

Screenshot comparisons across builds detect layout shifts, styling changes, and rendering differences that functional assertions miss entirely. Tools can automate the comparison, but a human still reviews the flagged differences. Visual regression is particularly useful for front-end-heavy applications where a CSS change in one component can break layouts elsewhere.

How to choose a regression testing strategy

Most CI/CD pipelines combine several of these. Unit regression runs on every commit as the first gate. Selective regression runs on pull requests, targeting the modules a change is most likely to affect. Complete regression runs before a merge to main or before a release. Progressive regression happens naturally when developers write new tests alongside new features in the same PR. The automation section below shows how to wire these into pipeline stages.

Type What it tests When to use it Trade-off
Corrective Existing test cases against unchanged product code Environment or dependency changes, no product code modified Fast, but only covers existing test paths
Progressive Modified behavior with new or updated test cases New features, major refactors Thorough, but requires writing new tests
Selective A targeted subset of modules based on impact analysis Time-constrained runs, isolated module changes Faster, but depends on accurate impact analysis
Complete The entire regression suite end to end Major releases, foundational changes Highest confidence, but slowest
Unit regression A single module’s unit tests after changes First CI gate, every commit Cheap and fast, but limited to unit-level behavior
Visual regression Screenshot diffs for layout and styling changes Front-end changes, CSS refactors Catches what functional tests miss, but requires human review

When should you run regression tests?

Run regression tests after any change that could affect existing behavior.

That includes new features, bug fixes, refactored code, dependency updates (including transitive dependencies that update silently), configuration changes, and infrastructure migrations. A pre-release regression run before production deploys is also standard practice.

The practical answer in a CI/CD pipeline is simpler: every commit. If the regression suite is automated and fast enough, there’s no reason to batch it into a nightly or weekly run.

Running on every commit means the developer who introduced a regression is still looking at the code when the failure comes back. Waiting 24 hours means that developer has moved on to something else, and now has to context-switch back to debug a change they’ve half-forgotten.

The question worth spending time on isn’t when to run regression tests. It’s how to make the suite fast enough that running it on every change is practical instead of painful. That’s a pipeline design problem.

How to automate regression testing in a CI/CD pipeline

Most teams already have regression tests. The gap is usually in how those tests run. They sit in a test directory, someone runs them locally before a release, and occasionally a few get skipped because the full suite takes too long.

Automating regression testing in CI/CD means moving those tests into a pipeline that runs them consistently, on every change, without anyone remembering to trigger them. That requires three things: a test suite organized so the pipeline can run subsets of it, a pipeline structured in stages so fast tests gate slow ones, and a parallelism strategy so the suite stays fast as it grows.

Step 1: Organize the test suite for pipeline execution

Before the pipeline can run regression tests selectively, the tests need to be labeled. Most test frameworks support some form of tagging or marking that lets a runner filter by category.

In pytest, custom markers handle this:

# conftest.py
import pytest

def pytest_configure(config):
    config.addinivalue_line("markers", "unit: unit-level regression tests")
    config.addinivalue_line("markers", "integration: integration-level tests")
    config.addinivalue_line("markers", "e2e: end-to-end tests")
    config.addinivalue_line("markers", "checkout: checkout flow coverage")
    config.addinivalue_line("markers", "pricing: pricing calculation coverage")
# test_pricing.py
import pytest

@pytest.mark.unit
@pytest.mark.pricing
def test_discount_code_applies_percentage():
    price = calculate_total(subtotal=100.00, discount_code="SAVE20")
    assert price == 80.00

@pytest.mark.unit
@pytest.mark.pricing
def test_null_discount_code_returns_full_price():
    price = calculate_total(subtotal=100.00, discount_code=None)
    assert price == 100.00

A test can carry multiple markers. The pricing tests above are both unit and pricing, so the pipeline can target them by speed (run all unit tests) or by domain (run all pricing tests). JUnit 5 uses @Tag for the same purpose; Jest filters with --testPathPatterns or custom config via testMatch and testRegex.

The goal is a test suite organized along two axes: speed (unit, integration, E2E) and domain (checkout, pricing, auth). That structure gives the pipeline the flexibility to run the right subset at each stage.

Step 2: Structure the pipeline in stages

Not every test needs to run on every trigger. A pipeline that runs 800 tests on every commit will either be too slow or too expensive. Instead, structure the pipeline as a series of gates where faster tests run first and slower tests run later. If a fast gate fails, the pipeline stops, and nobody waits for a 20-minute E2E suite that was going to fail anyway.

The general pattern:

  • Gate 1 Unit tests and linting. Runs on every commit. Should finish in under 3 minutes.
  • Gate 2 Integration tests and selective regression. Runs on pull requests. Should finish in under 10 minutes.
  • Gate 3 Full regression suite. Runs on merges to main. Parallelized to keep the wall-clock time short.

Here’s a working CircleCI configuration that sets up this gating structure:

# .circleci/config.yml
version: 2.1

jobs:
  unit-tests:
    docker:
      - image: cimg/python:3.12
    steps:
      - checkout
      - run: pip install -r requirements.txt
      - run: pytest -m unit --tb=short -q

  integration-tests:
    docker:
      - image: cimg/python:3.12
    steps:
      - checkout
      - run: pip install -r requirements.txt
      - run: pytest -m integration --tb=short -q

  full-regression:
    docker:
      - image: cimg/python:3.12
    steps:
      - checkout
      - run: pip install -r requirements.txt
      - run: pytest --tb=short -q

workflows:
  regression-pipeline:
    jobs:
      - unit-tests
      - integration-tests:
          requires:
            - unit-tests
      - full-regression:
          requires:
            - integration-tests
          filters:
            branches:
              only: main

The requires keyword creates the gating. Integration tests won’t start until unit tests pass. Full regression only runs on the main branch and only after integration tests pass. On a feature branch PR, the pipeline runs the first two gates and stops.

Step 3: Keep it fast as the suite grows

Regression suites grow over time. That’s expected. The danger is when the suite gets slow enough that developers start skipping it or treating failures as background noise. A 30-minute regression run creates a 30-minute feedback gap, and most developers won’t sit idle waiting for it.

The fix is parallel execution: split the test suite across multiple containers so the tests run concurrently instead of sequentially. A 30-minute suite distributed across 6 containers finishes in about 5 minutes.

CircleCI has built-in support for this through its parallelism key and the circleci tests split command:

  full-regression:
    docker:
      - image: cimg/python:3.12
    parallelism: 6
    steps:
      - checkout
      - run: pip install -r requirements.txt
      - run: |
          TEST_FILES=$(circleci tests glob "tests/**/*.py" | circleci tests split --split-by=timings)
          pytest $TEST_FILES --tb=short -q

The --split-by=timings flag distributes tests based on historical run duration, so each container finishes at roughly the same time instead of one container getting all the slow tests. CircleCI collects this timing data automatically from previous runs.

The faster the suite runs, the more often teams will run it. And the more often it runs, the sooner regressions get caught.

Regression testing: a real-world example

Say a team maintains a web application with a checkout flow. The backend is Python, tested with pytest. End-to-end tests run in Cypress. Their CI pipeline on CircleCI uses the gated structure from the previous section.

A developer refactors the pricing calculation logic, consolidating discount code handling that had been duplicated across several files into a single apply_discount() function. The change looks clean. The new unit tests pass. The developer pushes to a feature branch.

The pipeline picks it up. Unit tests pass in about two minutes. Integration tests pass in four. Then the selective regression stage runs Cypress tests tagged @checkout and @pricing against a staging environment, and one test fails: the new function doesn’t handle expired discount codes the same way the old logic did. Instead of throwing a validation error, it silently applies a zero discount and charges the customer full price.

The developer observes the failure in the pull request within eight minutes of pushing. The fix is four lines. Push again, all green, merge to main. The full regression suite runs in parallel, passes clean, and the deploy goes out.

That’s the whole loop. Eight minutes from push to feedback on a bug that, without this pipeline, could have easily reached production. In a manual QA process, this same bug might surface days later, after other developers have built more code on top of the broken pricing logic. By then, the fix is harder to isolate, the blast radius is wider, and someone is writing an incident postmortem instead of a four-line patch.

Can regression testing be fully automated?

Most regression testing can be automated. Unit, integration, and end-to-end regression tests are repeatable checks with known expected outcomes, and machines run them faster and more consistently than people do. Any test that a human would run the same way every time is a candidate for automation.

The parts that resist automation are the ones that require judgment. Exploratory testing for edge cases nobody anticipated. Usability regressions where a button still works but is now buried under a layout shift on mobile. Subjective visual quality that passes every functional assertion but looks wrong to a user. Visual regression tools bridge part of that gap by comparing screenshots across builds, but they still flag differences for a human to review.

AI-powered testing tools are starting to narrow that gap further. Some platforms now offer self-healing test locators that automatically update selectors when the UI changes, and predictive test selection that uses historical failure data to prioritize the tests most likely to catch regressions for a given change. These features are maturing but still work best as a layer on top of a well-structured test suite, not as a replacement for one.

The goal isn’t 100% automation. It’s automating the repeatable work so human testers spend their time on the things that only humans can evaluate. Automation without CI/CD, though, is just a script on someone’s laptop. Running automated regression tests inside a CI/CD pipeline is what makes them reliable: same environment, same sequence, same triggers, on every change.

Regression testing best practices

Most regression testing problems aren’t technical. They’re habit problems: the suite exists but nobody runs it consistently, or it runs but takes so long that people ignore the results. These six practices address the patterns that cause regression suites to lose their value over time.

  • Run on every commit, not on a schedule. Nightly regression runs create 24-hour feedback loops. By the time the results come back, the developer who introduced the regression has moved on to different work and has to context-switch back to code they’ve half-forgotten. Per-commit runs keep the feedback window tight enough that the fix is easy.
  • Prioritize by risk, not by coverage percentage. One hundred percent regression coverage sounds good in a slide deck, but it’s not a realistic target. Focus the regression suite on revenue-critical paths, modules that change frequently, and areas with a history of bugs. Those are the places where regressions are most likely and most expensive. Code coverage is a useful diagnostic for spotting untested areas, but it’s a poor goal in itself.
  • Keep the suite fast. If the regression suite takes longer than fifteen minutes, developers will find ways around it. Parallel execution and selective test runs are the two main tools for keeping feedback loops short.
  • Treat flaky tests as bugs. A test that passes and fails on the same code without any changes isn’t a minor annoyance. It erodes trust in the entire suite. When developers learn to ignore failures because “that test is just flaky,” real regressions start slipping through. Fix flaky tests or remove them.
  • Version tests with code. Tests belong in the same repository as the code they cover. They should change in the same pull requests, go through the same review process, and stay in sync with the production code at every commit. Tests that live in a separate repo or a separate tool tend to drift.
  • Measure duration, not just pass/fail. A regression suite that passes in 40 minutes is a problem even though every test is green. Track suite duration over time, monitor pass rate stability across runs, and watch time-to-feedback as the metric that matters most to developer experience.

Regression testing tools

Regression testing doesn’t require specialized tooling. It runs on the same test frameworks and CI/CD platforms teams already use. The categories that matter:

  • Test frameworks: pytest, Jest, JUnit 5, Cypress, Playwright, Selenium. Cypress and Playwright have largely replaced older Selenium-based setups for web applications, though Selenium remains widely used for cross-browser matrices in enterprise environments.
  • CI/CD platforms: CircleCI, GitHub Actions, GitLab CI, Jenkins. The differentiator for regression testing is parallelism support, which determines whether a growing suite stays fast.
  • Visual regression: Percy, Chromatic, BackstopJS. These compare screenshots across builds and flag differences for human review.
  • Test management: TestRail, Zephyr. These matter more as suites scale and teams need to track test cases and results over time.

The tool matters less than the practice. Any modern test framework paired with a CI/CD platform that supports parallel execution can run a regression suite on every commit. The hard part is building the discipline to write regression tests consistently, keep them fast, and treat failures as blockers rather than noise.

Conclusion

Regression testing is what makes “move fast” work without “break things” following close behind. A team that runs regression tests on every commit knows within minutes whether a change broke something, and that feedback loop is what separates shipping with confidence from shipping and hoping. The suite doesn’t need to be perfect. It needs to be fast, trusted, and running on every change. Everything else follows from there.

Get started with CircleCI.