Infrastructure as Code (IaC) is the practice of recording the desired state of your infrastructure using a declarative language. In this article, I’m going to assume that your team is starting from scratch. Maybe some of your build process has been scripted, and maybe there is some manual testing and quality assurance work happening. Many readers will find that they are midway through the IaC adoption journey I’ll describe, or that they have missed some steps.

(To hear our latest insights on taming infrastructure listen to our podcast with HashiCorp’s Armon Dadgar on The Confident Commit.)

Step zero: getting buy-in

This is the least technical step and also the most important. For your transition to IaC to succeed, you need buy-in from management, your peers, and your team members. This gives you the time necessary to properly phase out ambiguous, time-consuming, and error-prone manual processes. Here is some background to help you start the conversation.

Declarative vs procedural programming

Historically, developers and operations folks would set up elaborate procedural scripts in Bash to facilitate automation and configuration of resources. Procedural programming represents a step by step set of instructions; do x, then y, to achieve z.

The declarative programming paradigm says that if the desired state is z, then let the tooling take care of x and y automatically. We declare the desired end state and then the underlying machinery does the heavy lifting. As engineers, we’re used to storing things, all-the-things, in version control. This includes resources like READMEs, artifacts, test results, and configuration files. Why not store declarative snapshots of the underlying infrastructure that we host our applications on? We’re accustomed to automating things whenever possible. With modern testing frameworks, we’re able to automate the enforcement of testing standards across all of our projects. With tools like CircleCI, we’re able to automate deployments, and with tools like Terraform, we’re able to automatically provision infrastructure.

Using Terraform at CircleCI: a case study

We use Terraform heavily at CircleCI. We use Terraform to deploy and maintain our complex, ever-evolving cloud infrastructure. We use it to set up demo environments and manage our teams’ shared AWS resources. The self-hosted installation of CircleCI is packaged via Terraform. Many of our customers use Terraform.

I’m sure many of you are familiar with HashiCorp Terraform, but if not, here is the tl;dr:

Terraform is a magical tool that enables the automatic provisioning of infrastructure based on declarative templates, written either in HCL (HashiCorp Configuration Language) or JSON. These manifests represent declarative snapshots of what the desired state of your infrastructure would look like. Emphasis on declarative rather than procedural.

Let’s talk about a few reasons why you’d want to execute Terraform on CircleCI rather than a developer’s local workstation.

Determinism, visibility, and automation

There are several reasons to execute Terraform on CircleCI:

Determinism. Terraform is elegant in its simplicity. It’s a static binary written in Golang (I <3 Golang). Running it in an isolated environment with uniform input will almost invariably produce uniform output. We refer to this in computer science (and higher order mathematics) as referential transparency: same input, same output. No more “well, it worked on my machine…I’m not sure why it crashes on yours”.

Visibility. Every pipeline, every workflow, and every job is stored on CircleCI for posterity. You can look back at the job output from a job that ran 2 months ago. This lends itself to auditability and visibility. Other team members can see the output from Terraform. They can plan, look at errors when they come up, see the output for each and every run, etc. This is very powerful. We can look at every commit associated with every build. Should something go wrong, we can SSH into a job and look at the corresponding logfiles. Most importantly, we’ll have a record of each and every deployment.

Automation. Every time a developer writes and commits a new module to their VCS, a workflow is triggered, validating the plan file and setting the stage for a deployment automatically. Syntax errors and formatting issues can be captured automatically. With some branch-level filtering in place and a manual approval gating the deployment to production, you can automatically deploy to a staging or QA environment every time a successful change is made. This is just the beginning; imagine being able to include the dynamic provisioning of infrastructure as part of a series of integration tests, or being able to deploy a k8s cluster automatically.

Tip: Get buy-in from the rest of your organization - this should be step number one!

Exploring the benefits of Infrastructure as Code

Even if Terraform isn’t the right choice for your team, many of its benefits can be applied to IaC in general. IaC:

  • Makes your software development process more legible by recording informal or manual processes in a clear and unambiguous way (i.e. as code), which…
  • Reveals to your team the steps in your process that need improvement, which…
  • Enables your team to write, test, and measure those improvements using a CI/CD platform like CircleCI, which…
  • Results in shorter lead times for developing features and bug fixes, which…
  • Empowers your company to deliver value to customers at an increasing rate and to respond with greater agility to changes in development priorities and market trends.

The most important phrase in the above list is “at an increasing rate.” Infrastructure as Code, CI/CD, and DevOps culture create a positive feedback loop for your team’s productivity.

Because the productivity gains are exponential, the sooner you begin defining your infrastructure as code, the higher your productivity returns will be in the long run. Think of compound interest in a savings account. Your company will gain much more from investing 100 hours of work into adopting infrastructure as code today rather than waiting until your team has enough bandwidth to make the change. Spoiler alert: DevOps teams never have “enough” bandwidth.

Step one: documenting the process you use now

Start by documenting your build, test, deploy, and release processes as they exist right now. You can’t automate your processes if you don’t know exactly what they are!

While infrastructure as code is obviously relevant to deploying and releasing, it’s also important to understand the build and test processes. This is because your test environment should match your production environment as closely as possible to ensure that your test results are valid.

Tip: Take inventory of your current CI/CD processes and record them in great detail.

Resist the urge to write any infrastructure code just yet. For now, use plain, precise language, as though you were writing a cooking recipe. Ensure that your documentation is thorough.

Instead of this:

  1. Spin up a virtual machine
  2. Attach 100GB disk
  3. Download application code to server
  4. Install application

…try this:

  1. From the AWS console, spin up Windows Server 2019 AWS EC2 instance using the latest AMI with the prefix foobarbaz-wdows2019-hvm- in us-east-1 region
  2. Attach a 100GB gp2 EBS volume as drive E:
  3. Download the application code from an S3 bucket foobarbaz-code at path s3://foobarbaz-code/release/v2.0/appcode.zip to C:/Windows/Temp/appcode.zip
  4. Extract appcode.zip to C:/Windows/Temp/appcode
  5. Run C:/Windows/Temp/appcode/install.bat

This level of detail is important. Ambiguity in your descriptions will create obstacles to your automation efforts. Vague instructions lead engineers to make erroneous assumptions when writing infrastructure code that may later require costly bugfix efforts to correct.

Don’t forget to document your change control processes as well, since these can often be partially or fully automated in your CI/CD pipelines.

Step two - version everything to create a source of truth

Once you have thoroughly documented everything, put it all into a version control system (VCS) like GitHub. If your processes include manual steps, write those steps down in a readme file and check that into your VCS as well. Make sure that each repository has both a readme and a changelog; you’ll thank yourself later!

Writing code collaboratively

Read up on git branching strategies and consider which one best suits your product. Do you develop and run a SaaS application?

Check out trunk-based development. Do you develop a mobile or enterprise application that requires you to support many versions simultaneously?

Tip: Choose and enforce a git branching and tagging strategy.

Look into git flow. Read up on semantic versioning for your tagging strategy. Explain your chosen tagging and branching strategy to your team. Do not put this off until later, or your branches will end up resembling a bowl of spaghetti and the week before your next release will be a very long week indeed.

Step three - iteratively improve your processes

At this point, you’ve thoroughly recorded all of your processes and checked them into version control. Now you’re finally ready to begin converting your manual, imperative processes into automated, declarative infrastructure code.

Tip: Regularly analyze your infrastructure code and deployment processes to identify areas for improvement.

Begin by identifying which parts of your process need improvement. In the spirit of “shifting left” to catch bugs early before they become costly production outages, prioritize automating your build process and writing unit tests. Run your tests on a CI/CD platform like as CircleCI. Next, focus on writing integration and end-to-end tests. Finally, write infrastructure code to automate your deployment and release processes. Your infrastructure code repositories should reflect the desired state of your existing infrastructure as well as contain everything necessary to redeploy that infrastructure in the unlikely event that, say, a giant meteor destroys the us-west-2 region.

What IaC tools should I use?

We’ve written about this topic before, but I’ll break it down here and give some general recommendations.

Tip: Choose your infrastructure deployment tools carefully.
  • Build and test. Build processes and unit testing frameworks will vary by language, but you can use CircleCI’s orbs to simplify the process. Here are examples for Ruby and Node. Once you’ve written your build script and tests, look into a CI/CD tool on which to run them. CircleCI’s config.yml syntax is easy to learn, and you can log in with your GitHub or Bitbucket account and start running builds and tests in minutes. You can also use orbs to integrate with third-party testing services like Katalon or Cypress.
  • Deploy and release. Public cloud infrastructure is deployed and managed with a number of tools. Currently, Terraform and Pulumi are two of the most popular multi-cloud tools for this purpose.
  • Config management. If your application is stateful, you may also consider using a config management tool like Ansible with CircleCI’s scheduled pipelines to enforce application state.

Conclusion

The benefits of IaC are enormous. While infrastructure as code pertains to, well, infrastructure, the same kinds of “configuration drift” that invariably occurs when manually provisioning servers/services are equally applicable to networking infrastructure, subnet configuration, ingress/egress rules, load balancer configuration, etc. IaC alleviates many of those issues and more. It’s powerful, it’s elegant, and it’s how many teams and organizations are managing their resources in the cloud.