Within software, there are many technical concepts and definitions. It can be mind-boggling when learning new topics or even when switching between companies that use different terms.

Testing is one such topic . As modern technology companies mature along their DevOps journey by adopting continuous integration practices, an increasing level of importance is being placed upon testing and testing automation. Don’t get lost in the confusion of all of the different methods. Here is a high-level reference to the most common types of software testing.

1. Unit testing

Unit testing is a testing method focused around vetting individual “units”, or pieces of code.

The primary goal of unit testing is to determine logical integrity — that a piece of code does what it’s supposed to.

Generally, people will test individual methods or functions as units, and depending on the size and complexity of code, also classes. They’re tested in isolation, and subsequently any typical dependencies are stubbed or mocked.

An example of this would be if you had a function that massages data from a database. However, since it’s a unit test, you wouldn’t use a real database: you’d make a call to a stubbed endpoint, which returns the data you’d normally expect from a database. That way, the only functionality being tested is this piece of code, or the unit.

Most languages have at least one unit testing framework recommended for itself (e.g. Java → JUnit, Python → PyUnit or PyTest, JavaScript → Mocha, Jest, Karma, etc.).

2. Integration testing

Integration testing is a testing method focused on vetting multiple components together.

The primary goal of integration testing is to ensure relationship integrity and flow of data between components or units.

Usually, people will run unit tests first to test logical integrity of individual units. Then they will run integration tests to ensure interaction between these units is behaving as expected. Continuing the above example, an integration test in this case would be running the same test against a real database. With real databases, you have additional scenarios and behaviors to consider.

“Integration testing” is a broad term and encompasses any tests where multiple components are involved. Subsequently, a large variety of technologies and frameworks can be used, including the same ones used above in unit testing, or separate, behavioral-based frameworks (examples listed in next section).

3. End-to-End testing (E2E, System)

System tests, or end-to-end (E2E) tests, are focused on vetting the behavior of a system from end-to-end.

The primary goal of end-to-end testing is to ensure the entire application or system as a unit behaves how we expect it to, regardless of internal workings.

In essence, unit and integration tests are typically “white box” (e.g. internals are known) whereas E2E tests are typically “black box” (e.g. we only verify input and output combinations). An example E2E test might be a generic user story like “Fetch a user’s data.” The input could be a simple GET request to a specific path, and then we verify that the output returned is what we expect. How the system fetched that data underneath is irrelevant.

As you can see, E2E tests can only check the overall behavior, so this is why unit and integration tests are necessary. It could be that although the output is correct, the way the result is obtained internally is incorrect, and an E2E test would not catch that.

For E2E tests, you typically use behavioral-based frameworks. You might use frameworks like Cucumber, Postman, SoapUI, Karate, Cypress, Katalon, etc. Note that a lot of API-testing frameworks are used for E2E testing because an API is typically how you programmatically interact with an app.

4. Acceptance testing

Acceptance testing is typically a phase of the development cycle.

The primary goal of acceptance testing is to verify that a given product or feature has been developed according to specifications set forth by a customer or an internal stakeholder, like a product manager.

Within acceptance testing, there can also be multiple phases, such as α-testing or β-testing. As much of the software development world moves toward Agile processes, user acceptance testing has become much less rigid and more collaborative.

It’s important to note that while acceptance tests can verify that the application behaves how a user wants it to, it does not verify the integrity of the system. Another caveat of user acceptance testing is there’s a limit to the corner cases and scenarios a person can come up with - this is why the previous automated testing methods are important since every single use case and scenario is codified.

5. White box testing (structural, clear box)

White box (also called structural or clear box) testing describes tests or methods in which the details and inner workings of the software being tested are known.

Since you know the functions, the methods, the classes, how they all work, and how they tie together, you’re generally better equipped to vet the logical integrity of the code.

For example, you might know there’s a quirk with the way a certain language handles certain operations. You could write specific tests for that, which you otherwise would not know to write in a black-box scenario.

Unit testing and integration testing are often white box.

6. Black box testing (functional, behavioral, closed box)

In contrast, black box (also called functional, behavioral, or closed box) testing describes any tests or methods in which the details and inner workings of the software being tested are not known.

Since you don’t know any of the particulars, you can’t really create test cases that target specific niche scenarios or stress specific logic in the system.

The only thing you do know is that for a request or given piece of input, a certain behavior or output is expected. Hence, black box testing primarily tests the behavior of a system. End-to-end tests are often black box.

7. Gray box testing

Gray box testing is just a hybrid combination of black box and white box.

Gray box testing takes the ease and simplicity of black box testing (e.g. input → output) and targets specific code-related systems of white box testing.

The reason gray box testing exists is because black box testing and white box testing by themselves can miss important functionality.

  • Black box testing only tests that you get a certain output for a given input. It does not test the integrity of internal components — you could be getting the correct output purely by chance.
  • White box testing focuses on the integrity of individual units and how they function together, but it is sometimes insufficient for finding system-wide or multi-component defects.

By combining the two types together, gray box testing can encompass more complicated scenarios to really validate that an application is sound in structure and logic.

8. Manual testing

Self-explanatory — manual testing is testing in which a user manually specifies input or interacts with a system. They may also manually assess results.

This method of testing can generally be slow and error-prone. Much of the software industry has moved towards automated testing alongside adoption of Agile principles.

Nowadays, users might manually test a product in beta to check for acceptance, edge cases, and niche scenarios.

9. Static testing

Static testing describes any methods or methods of testing in which no actual code is being executed.

This actually includes reviewing code together with others, manually verifying the logic and integrity of functions, classes, etc.

Just like manual testing, static testing can be slow and error-prone, and generally static testing is done as a first line of defense to catch very obvious problems.

Many companies engage in code reviews before an engineer’s work is merged into the main branch. These code reviews are to save time and catch the low-hanging fruit.

10. Dynamic testing

Dynamic testing describes any methods or methods of testing in which code is actually being executed.

Generally, all of the previous mentioned testing methods are dynamic except manual and sometimes acceptance. You’re usually running automated scripts or using frameworks to execute inputs to your system.

11. UI/Visual testing (browser testing)

UI or browser testing describes tests which specifically vet the integrity and behavior of user interface components.

Often when using a website, certain actions are expected to result in certain states. UI tests verify that these happen correctly. For example, the way you’ve implemented certain CSS might break in Firefox, but not Chrome. Browser tests can check that.

There are a lot of popular browser testing frameworks such as Selenium, Cypress, TestCafe, SauceLabs, Katalon Studio, Browsersync, Robot, etc.

12. Smoke testing

Smoke testing just refers to a smaller subset of checks to reasonably verify a system is working.

It’s just choosing and running a non-exhaustive set of tests that vet core functionality.

An example of this might be testing just a couple user flows, such as “Fetch a user’s data” from above. It’s not exhaustive, but since most of your application includes a user logging in, making a request, and fetching data from somewhere, this one test or few tests can give you reasonable confidence that your system is functional and working.

Usually smoke tests are run when users expect changes to not have made any significant impacts to overall logic and function. It can be expensive and time-consuming to run the full suite of all tests every single time, so smoke tests are used as an inexpensive safety measure that can be run more often.

13. Regression testing

Regression testing is a testing method to verify if any previously-functional features have suddenly broken (or regressed).

This often includes running the entirety of all unit, integration, and system tests to ensure no functionality has changed unexpectedly. As we all know, sometimes software has the oddest way of breaking.

Regression tests are often time-consuming and can be very expensive, which is why sometimes people will run smoke tests instead, especially if recent changes are not logically expected to impact the whole system.

Often when people set up CI/CD, they will run smoke tests on almost every commit, whereas regression suites might run at set intervals or on large features to ensure continuous integration without issues.

14. Load testing

Load testing refers to testing an application’s response to increasing demand.

This includes testing sudden influxes of requests or users that might put unexpected strain on the system. Load testing is often done as a part of security tests to ensure an application and its system cannot be DDOS’d.

Load testing is also done to verify the maximum amount of data a system can handle at any given time. It’s integral for helping teams determine effective HA (high availability) implementation and scaling formulas.

15. Penetration testing

Penetration testing (or pen testing) is a form of security testing that involves verifying the robustness of an application’s security.

Every manner in which an application can be compromised (cross-site scripting, unsanitized inputs, buffer overflow attacks, etc.) is exploited to check how the system handles it. Pen tests are an important part of making sure a company does not fall victim to serious breaches.


As you can see, there are a lot of different terms used in software testing, and many of these often overlap or are used interchangeably (however incorrectly).

By understanding exactly what each of these terms mean and knowing what they encompass, you’ll be able to understand what people are talking about and dig in deeper as per your needs. You might even be able to correct them now. For further reading on thinking productively about testing as a whole, check out the Testing Strategies from the Trenches blog post.

If you’re interested in automating your tests, sign up for a free account today and see how CircleCI can help you catch problems sooner.

Happy engineering.