For many developers and IT professionals, testing in production is like base jumping: both thrilling and terrifying at the same time. But the truth is that testing in production can be completely safe. What’s more, if done in a managed way, it is a key step to reaping the benefits of releasing features faster without breaking things. So how can you rethink testing in production and do it safely?

You are already testing in production

Although many developers consider testing something you do only in pre-production, the reality is that we all test in production all the time. Each time you do a big-bang release to production–pushing all your changes and migrating all traffic to your new code at once–you are testing your code on real user traffic for the first time. This is not the kind of testing you want to be doing. In pre-production, you have a lot of room for testing for specific conditions and collecting feedback. The test is controlled, the software is validated, and there is room for improvement.

But here is the problem: you can put a release through all the pre-production testing you can think of, check the health of your deployment thoroughly, and still run into issues once the code reaches a real production environment. That is because synthetic user testing can never predict real user behavior.

The solution is to rethink testing in production and, by that same token, your release process. Instead of exposing a new update or version to all users in a big bang (or even a mass traffic batch of, say, 20 percent of users), make your releases as small and controlled as possible. Release incrementally to a controlled subset of users, such as those with a specific location, device, cookie, IP address, or basket value. Applying user segmentation allows you to set very specific conditions around the release and test, observe, and roll back if things go wrong.

How to test in production safely

Testing in production is the only way to know for sure that your application or service works as intended, but it doesn’t have to be scary. With the right tools and mindset, you can implement a release process that allows you to ship new features to your users with speed and confidence. Here are the steps you need:

  • Specify which users will be exposed to an update or feature and under which conditions. In steps, specify when to increase the number of users.
  • Specify when to roll back, and make sure your rollback process is automated.
  • Release in a series of automated steps.
  • Make sure you are able to monitor and observe. Specifying the conditions of the release in advance creates health checks–not only tech checks but also checks for business outcomes. This will give you early warning signals if things go wrong.
  • Share knowledge across teams. Learn what’s normal behavior in production and what’s not.

If this sounds like a canary release to you, that’s because it is. Codifying the canary release into a series of release policies lays the foundation for a managed, robust release process. There are several benefits to this approach:

  • You can rely on a controlled, repeatable process that improves the quality and reliability of your releases.
  • You can eliminate guesswork from your testing strategy.
  • No more stressful manual rollbacks.
  • You can offer better incident response by seeing what condition of the test wasn’t met and fixing it faster.
  • Having two versions in production allows for A/B testing that improves application performance and helps organizations maximize ROI.
  • You can have multiple teams releasing code changes to production simultaneously.

Having controlled, incremental releases tested in production on live user traffic lays the foundation for more confidence in the release process. Add the intelligent backward and forward automation offered by release orchestration, and you have a platform that will take over release decisions for you. That makes not just testing in production but the entire process of shipping new features to customers faster and much less scary, leaving you to do other important things than worrying about pager duty–like base jumping.