Infrastructure as Code, part 1: create a Kubernetes cluster with Terraform

This series shows you how to get started with infrastructure as code (IaC). The goal is to help developers build a strong understanding of IaC through tutorials and code examples.

Infrastructure as Code (IaC) is an integral part of modern continuous integration pipelines. It is the process of managing and provisioning cloud and IT resources using machine readable definition files. IaC gives organizations tools to create, manage, and destroy compute resources by statically defining and declaring these resources in code.

Here are the topics this series will cover:

Part 1: create a Kubernetes cluster (this post)
Part 2: build Docker images and deploy to Kubernetes
Part 3: automate deployments with CI/CD

In this post, I will discuss how to use HashiCorp’s Terraform to provision, deploy, and destroy infrastructure resources. Before we start, you will need to create accounts in target cloud providers and services such as Google Cloud and Terraform Cloud. Then you can start learning how to use Terraform to create a new Google Kubernetes Engine (GKE) cluster

Prerequisites

Before you get started, you will need to have these things in place:

A Google Cloud Platform (GCP) account
- Google Cloud project
- Local install of the Google Cloud SDK CLI
- Local install of the Terraform CLI
A Terraform Cloud account
- Terraform Cloud organization
- Two new Terraform Cloud workspaces named iac_gke_cluster and iac_kubernetes_app. Choose the "No VCS connection" option
- Enable local execution mode in both the iac_gke_cluster and iac_kubernetes_app workspaces
- Create a new Terraform API token
A Docker Hub account
- Local install of the Docker client
Git clone the Learn infrastructure as code repo from GitHub
A CircleCI account

This post works with the code in the part01 folder of this repo. First though, you need to create GCP credentials and then Terraform.

Creating GCP project credentials

GCP credentials will allow you to perform administrative actions using IaC tooling. To create them:

Go to the create service account key page
Select the default service account or create a new one
Select JSON as the key type
Click Create
Save this JSON file in the ~/.config/gcloud/ directory (you can rename it)

How does HashiCorp Terraform work?

HashiCorp Terraform is an open source tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing service providers as well as custom in-house solutions.

Terraform uses configuration files to describe the components needed to run a single application or your entire data center. It generates an execution plan describing what it will do to reach the desired state, and then executes it to build the infrastructure described by the plan. As the configuration changes, Terraform determines what changed and creates incremental execution plans which can be applied to update infrastructure resources.

Terraform is used to create, manage, and update infrastructure resources such as physical machines, VMs, network switches, containers, and more. Terraform can manage includes low-level infrastructure components such as compute instances, storage, and networking, as well as high-level components like DNS entries and SaaS features.

Almost any infrastructure type can be represented as a resource in Terraform.

What is a Terraform provider?

A provider is responsible for understanding API interactions and exposing resources. Providers can be an IaaS (Alibaba Cloud, AWS, GCP, Microsoft Azure, OpenStack), a PaaS (like Heroku), or SaaS services (Terraform Cloud, DNSimple, Cloudflare).

In this step, we will provision some resources in GCP using Terraform code. We want to write Terraform code that will define and create a new GKE cluster that we can use in part 2 of the series.

To create a new GKE cluster, we need to rely on the GCP provider for our interactions with GCP. Once the provider is defined and configured, we can build and control Terraform resources on GCP.

What are Terraform resources?

Resources are the most important element in the Terraform language. Each resource block describes one or more infrastructure objects. An infrastructure object can be a virtual network, a compute instance, or a higher-level component like DNS records. A resource block declares a resource of a given type (google_container_cluster) with a given local name like “web”. The name is used to refer to this resource from elsewhere in the same Terraform module, but it has no significance outside of the scope of a module.

Understanding Terraform code

Now that you have a better understanding of Terraform providers and resources, it is time to start digging into the code. Terraform code is maintained within directories. Because we are using the CLI tool, you must execute commands from within the root directories where the code is located. For this tutorial, the Terraform code we are using is located in the part01/iac_gke_cluster folder here. This directory contains these files:

providers.tf
variables.tf
main.tf
output.tf

These files represent the GCP resources infrastructure that we are going to create. These are what Terraform processes. You can place all of the Terraform code into one file, but that can become harder to manage once the syntax grows in volume. Most Terraform devs create a separate file for every element. Here is a quick break down of each file and discuss the critical elements of each.

Breakdown: providers.tf

The provider.tf file is where we define the cloud provider we will use. We will use the google_container_cluster provider. This is the content of the provider.tf file:

provider "google" {
  # version     = "2.7.0"
  credentials = file(var.credentials)
  project     = var.project
  region      = var.region
}

This code block has parameters in closure { } blocks. The credentials block specifies the file path to the GCP credential’s JSON file that you created earlier. Notice that the values for the parameters are prefixed with var. The var prefix defines the usage of Terraform Input Variables, which serve as parameters for a Terraform module. This allows aspects of the module to be customized without altering the module’s own source code, and allows modules to be shared between different configurations. When you declare variables in the root module of your configuration, you can set their values using CLI options and environment variables. When you declare them in child modules, the calling module will pass values in the module block.

Breakdown: variables.tf

The variables.tf file specifies all the input variables that this Terraform project uses.

variable "project" {
  default = "cicd-workshops"
}

variable "region" {
  default = "us-east1"
}

variable "zone" {
  default = "us-east1-d"
}

variable "cluster" {
  default = "cicd-workshops"
}

variable "credentials" {
  default = "~/.ssh/cicd_demo_gcp_creds.json"
}

variable "kubernetes_min_ver" {
  default = "latest"
}

variable "kubernetes_max_ver" {
  default = "latest"
}

The variables defined in this file are used throughout this project. All of these variables have default values, but the values can be changed by defining them with the CLI when executing Terraform code. These variables add much needed flexibility to the code and makes it possible to reuse valuable code.

Breakdown: main.tf

The main.tf file defines the bulk of our GKE cluster parameters.

terraform {
  required_version = "~>0.12"
  backend "remote" {
    organization = "datapunks"
    workspaces {
      name = "iac_gke_cluster"
    }
  }
}

resource "google_container_cluster" "primary" {
  name               = var.cluster
  location           = var.zone
  initial_node_count = 3

  master_auth {
    username = ""
    password = ""

    client_certificate_config {
      issue_client_certificate = false
    }
  }

  node_config {
    machine_type = var.machine_type
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]

    metadata = {
      disable-legacy-endpoints = "true"
    }

    labels = {
      app = var.app_name
    }

    tags = ["app", var.app_name]
  }

  timeouts {
    create = "30m"
    update = "40m"
  }
}

Here is a description for each element of the main.tf file starting with the terraform block. This block specifies the type of Terraform backend. A “backend” in Terraform determines how state is loaded and how an operation such as apply is executed. This abstraction enables things like non-local file state storage and remote execution. In this code block, we are using the remote backend which uses the Terraform Cloud and is connected to the iac_gke_cluster workspace you created in the prerequisites section.

terraform {
  required_version = "~>0.12"
  backend "remote" {
    organization = "datapunks"
    workspaces {
      name = "iac_gke_cluster"
    }
  }
}

The next code block defines the GKE Cluster that we are going to create. We are also using some of the variables defined in variables.tf. The resource block has many parameters used to provision and configure the GKE Cluster on GCP. The important parameters here are name, location, and Initial_node_count, which specifies the initial total of compute resources or virtual machines that will comprise this new cluster. We will start with three compute nodes for this cluster.

resource "google_container_cluster" "primary" {
  name               = var.cluster
  location           = var.zone
  initial_node_count = 3

  master_auth {
    username = ""
    password = ""

    client_certificate_config {
      issue_client_certificate = false
    }
  }

  node_config {
    machine_type = var.machine_type
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]

    metadata = {
      disable-legacy-endpoints = "true"
    }

    labels = {
      app = var.app_name
    }

    tags = ["app", var.app_name]
  }

  timeouts {
    create = "30m"
    update = "40m"
  }
}

Breakdown: output.tf

Terraform uses something called output values. These return values of a Terraform module and provide a child module with outputs. The child module outputs expose a subset of its resource attributes to a parent module, or print certain values in the CLI output after running terraform apply. The output.tf blocks shown in the following code sample output values to readout values like cluster name, cluster endpoint, as well as sensitive data, specified with the sensitive parameter.


output "cluster" {
  value = google_container_cluster.primary.name
}

output "host" {
  value     = google_container_cluster.primary.endpoint
  sensitive = true
}

output "cluster_ca_certificate" {
  value     = base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)
  sensitive = true
}

output "username" {
  value     = google_container_cluster.primary.master_auth.0.username
  sensitive = true
}

output "password" {
  value     = google_container_cluster.primary.master_auth.0.password
  sensitive = true
}

Initializing Terraform

Now that we have covered our Terraform project and syntax, you can start provisioning the GKE cluster using Terraform. Change directory into the part01/iac_gke_cluster folder:

cd part01/iac_gke_cluster

While in part01/iac_gke_cluster, run this command:

terraform init

Your output should be similar to this:

root@d9ce721293e2:~/project/terraform/gcp/compute# terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google" (hashicorp/google) 3.10.0...

* provider.google: version = "~> 3.10"

Terraform has been successfully initialized!

Previewing with Terraform

Terraform has a command that allows you to dry run and validate your Terraform code without actually executing anything. The command is called terraform plan. This command also graphs all the actions and changes that Terraform will execute against your existing infrastructure. In the terminal, run:

terraform plan

The output:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_container_cluster.primary will be created
  + resource "google_container_cluster" "primary" {
      + additional_zones            = (known after apply)
      + cluster_ipv4_cidr           = (known after apply)
      + default_max_pods_per_node   = (known after apply)
      + enable_binary_authorization = false
      + enable_intranode_visibility = (known after apply)
      + enable_kubernetes_alpha     = false
      + enable_legacy_abac          = false
      + enable_shielded_nodes       = false
      + enable_tpu                  = (known after apply)
      + endpoint                    = (known after apply)
      + id                          = (known after apply)
      + initial_node_count          = 3
      + instance_group_urls         = (known after apply)
      + label_fingerprint           = (known after apply)
      + location                    = "us-east1-d"
  }....
Plan: 1 to add, 0 to change, 0 to destroy.

Terraform will create new GCP resources for you based on the code in the main.tf file.

Terraform apply

Now you can create the new infrastructure and deploy the application. Run this command in the terminal:

terraform apply

Terraform will prompt you to confirm your command. Type yes and press Enter.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

Terraform will build your new GKE cluster on GCP.

Note: It will take 3-5 minutes for the cluster to complete. It is not an instant process because the back-end systems are provisioning and bringing things online.

After my cluster was completed, this was my output:

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

cluster = cicd-workshops
cluster_ca_certificate = <sensitive>
host = <sensitive>
password = <sensitive>
username = <sensitive>

The new GKE cluster has been created and the Outputs results are displayed. Notice that the output values that were marked sensitive are masked in the results with <sensitive> tags. This ensures sensitive data is protected, but available when needed.

Using Terraform destroy

Now that you have proof that your GKE cluster has been successfully created, run the terraform destroy command to destroy the assets that you created in this tutorial. You can leave it up and running, but be aware that there is a cost associated with any assets running on GCP and you will be liable for those costs. Google gives a generous $300 credit for its free trial sign-up, but you could easily eat through that if you leave assets running. It is up to you, but running terraform destroy will terminate any running assets.

Run this command to destroy the GKE cluster:

terraform destroy

Conclusion

Congratulations! You have just completed part 1 of this series and leveled up your experience by provisioning and deploying a Kubernetes cluster to GCP using IaC and Terraform.

Continue to part 2 of the tutorial. In part 2 you will learn how to build a Docker image for an application, push that image to a repository, and then use Terraform to deploy that image as a container to GKE using Terraform.

Here are a few resources that will help you expand your knowledge: