Manage Kubernetes environments with GitOps and dynamic config

Most modern infrastructure architectures are complex to deploy, involving many parts. Despite the benefits of automation, many teams still chose to configure their architecture manually, carried out by a deployment expert or, in some cases, teams of deployment engineers. Manual configurations open up the door for human error.

While DevOps is very useful in developing and deploying software, using Git combined with CI/CD is useful beyond the world of software engineering. The idea of applying these principles to non-software endeavors is known as GitOps. One of the most common uses of GitOps is provisioning cloud infrastructure using infrastructure-as-code (IaC) tools.

In this tutorial, you’ll learn how to use CircleCI’s dynamic configuration to automatically configure, test, and deploy different Kubernetes environments for each of your development and production branches. This solution will make deploying Kubernetes infrastructure repeatable and reliable for platform engineers managing environments across multiple development teams. For an additional layer of security, we’ll cover how to achieve this using OpenID Connect (OIDC) tokens to authenticate to the cloud provider, eliminating the need to store long-lived credentials inside the CI environment.

Managing Kubernetes infrastructure using GitOps, Terraform, and CircleCI

Top-performing DevOps teams prioritize having their applications in a deployable state, and this paradigm is also taking hold in the infrastructure space: maintaining deployable infrastructure is a competitive differentiator that allows teams to quickly and consistently scale their operations and react to changing demands. To ensure this process is reliable and repeatable, teams leverage the core concepts of continuous integration (CI) to automatically build and test their infrastructure on every new commit. This process is known as GitOps.

At CircleCI we preach that all aspects of your testing and production environments should be documented in code, stored in a version control system (VCS), and applied via automated processes. Version control systems are a way to keep multiple versions of your files, so that when a change is made to a file, the previous versions can still be accessed. They also enable more efficient collaboration among software delivery teams.

It is now viewed as best practice not only to use version control for source control, but that everything should be stored in a VCS—configuration files, build and test scripts, documentation, and so forth. This enables collaboration, a single source of truth, and easy onboarding for every component of your software delivery system.

Combining the GitOps approach with best-in-class continuous integration and infrastructure as code (IAC) tools allows platform teams to dynamically maintain, scale, and secure their environments in an automated and resource-efficient way.

Step 0: Before you begin

To apply the concepts discussed in this tutorial to you own projects, you will need the following resources:

A CircleCI account
A GitHub, GitLab, or Bitbucket project that has been set up in CircleCI
The CircleCI CLI installed on your workstation
An AWS account
An EKS cluster
Knowledge of Terraform
Knowledge of Kubernetes

Note: You can find the complete configuration code discussed in this tutorial in this GitHub Gist.

Step 1: Setup config

CircleCI’s dynamic configuration allows you to use jobs and workflows not only to execute work but also to determine what jobs run in response to a given change for more dynamism within your pipelines. This is an extremely useful tool, especially in IAC, as it enables platform teams to manage their environments with scalable, repeatable, and traceable automated processes using code stored in a VCS. Introducing change validation to environment management is a game changer for the top-performing delivery teams on CircleCI.

The first thing you need to do is enable dynamic config at the project level. You can find out exactly how to do that in our documentation. Then, you will need to declare the following to your setup_config.yml file. This will act as your setup configuration file.

setup: true

A setup workflow continues the pipeline on to the desired next configuration. More on how this works can be found in our documentation.

Dynamic configuration is implemented by utilizing the continuation orb.

orbs:
 continuation: circleci/continuation@0.3.1

In this tutorial we will leverage dynamic config at the beginning of each pipeline run to determine what branch we are on. We will see why this is important as the tutorial progresses, but at a high level, we will use branching patterns in the VCS (GitHub in this case) to manage our environments:

A commit to the main branch will trigger a build, test and deploy of our production EKS cluster.
A commit to a feature branch will build, test, and deploy a development cluster.

This will allow you to run different jobs on your code depending on whether you are building a development cluster or a production cluster.

jobs:
 build:
   docker:
     - image: cimg/base:2022.07
   steps:
     - checkout
     - run:
        command: |
           if [[ "main" == "<< pipeline.git.branch >>" ]]; then
             echo '{"is_main_branch": true}' >> parameters.json
           else
             echo '{"is_main_branch": false}' >> parameters.json
           fi
         name: Check if running on main branch
     - continuation/continue:
         configuration_path: .circleci/continue_config.yml
         parameters: parameters.json

In the setup workflow above, we use a simple bash script to determine what branch we are on and set this as an environment variable (pipeline.git.branch), which is passed to a parameters.json file. We then use the continuation orb to point to the config we want to run and pass in the parameters.json file containing the branch that we are building on.

Step 2: Continue_Config.yaml

As discussed, the setup workflow continues the pipeline on to the desired next configuration. In this tutorial the next configuration is a continue_config.yaml file. Specifics around how continuation works can be found in our docs. Let’s dive into what happens in the config.

Parameters

At the top of the file you will notice we are using CircleCI parameters, which are reusable pipeline configuration syntax that allow us to dynamically produce pipeline outcomes based on the parameters we pass in.

version: 2.1

parameters:
  cluster_suffix:
    type: string
    default: "solutions-eng"
  is_main_branch:
    type: boolean
    default: false
  statefile_to_destroy:
    type: string
    default: ""

In this example, we are utilizing three dynamic parameters:

cluster_suffix to dynamically populate our cluster name
is_main_branch to determine whether we are on a development or production branch, as described in the previous section
statefile_to_destroy to manage Terraform state files

You can find out more about Terraform state here.

Terraform orb

One of the first things you will notice in our continute_config.yaml file is the use of CircleCI’s Terraform orb.

orbs:
  terraform: circleci/terraform@3.1.0

Orbs are an abstraction layer that sit on top of your CircleCI config. They are CircleCI’s flavor of configuration-as-code and enable engineering teams to share repeatable config across various projects. Although the concept is very similar to a plugin in other tools like Jenkins, orbs are lightweight and can be passed amongst configs without clogging up your build system. Think of them as a more user-friendly version of Jenkins shared libraries.

With CircleCI you get access to CirceCI’s orb registry, a place where CircleCI experts, partners, and members of the community write and publish orbs. In this config we are using CirceCI’s Terraform orb. This orb is owned, maintained, and hydrated by CircleCI with Terraform best practices in mind. When you are utilizing Terraform to handle your infrastructure on CircleCI, this orb should become your go-to.

Jobs

Let’s dive into our jobs and see how we are leveraging Terraform to handle the deployment of our EKS infrastructure.

provision-eks

As the name suggests, provision-eks will handle the deployment of our EKS cluster. This job is configured to run on cimg/aws:2022.06.1, a Docker image custom-built by CircleCI to be highly performant in a CI/CD context. Similar to the Terraform orb, this image is owned, maintained and hydrated by CircleCI to enable a seamless onboarding experience so engineering teams can begin deploying to AWS as quickly as possible.

    docker:
      - image: cimg/aws:2022.06.1
    environment:
      AWS_REGION: us-west-2

The AWS CLI also requires us to pass as an environment variable: the AWS region we want to deploy to. This is easily done with the environment stanza, where we specify us-west-2 as the AWS region.

Next, let’s look at some of the steps in our provision-eks job:

steps:
      - checkout
      - run:
          # Need to ensure cluster name is trimmed to not exceed 37 characters. See comment in eks/vpc.tf for more context.
          command: |            
            temp_cluster_name=$(echo "cera-<< pipeline.parameters.cluster_suffix >><<^pipeline.parameters.is_main_branch>>-<< pipeline.git.branch >><</pipeline.parameters.is_main_branch>>" | cut -c 1-37)
            echo "Cluster name is: $temp_cluster_name"
            echo "export cluster_name=$temp_cluster_name" >> $BASH_ENV
      - enable-oidc

The first two steps are checkout and run:

checkout is a special CircleCI key that checks out the code from your repository.
run utilizes parameters (cluster_suffix, is_main_branch, and pipeline.git.branch) to set the name of our cluster, taking into account the branch name that we passed in from the base config.yaml.

The next step, enable-oidc, runs a command specified at the bottom of continue_config.yml.

commands:
  enable-oidc:
    steps:
      - run:
          name: authenticate-and-interact
          command: |
            # use the OpenID Connect token to obtain AWS credentials
            read -r AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN \<<< \
              $(aws sts assume-role-with-web-identity \
              --role-arn ${AWS_ROLE_ARN} \
              --role-session-name "CircleCI-${CIRCLE_WORKFLOW_ID}-${CIRCLE_JOB}" \
              --web-identity-token $CIRCLE_OIDC_TOKEN \
              --duration-seconds 3600 \
              --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
              --output text)
            export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN
            # interact with AWS
            aws sts get-caller-identity
            echo "export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}" >> $BASH_ENV
            echo "export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}" >> $BASH_ENV
            echo "export AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN}" >> $BASH_ENV
            source $BASH_ENV

In order to keep our friends in DevSecOps happy, we are going to connect to the EKS cluster using OIDC. We can see the command defined in the config. You can learn more about why you should use OIDC and how easily this can be implemented on CircleCI in Using OpenID Connect identity tokens to authenticate jobs with cloud providers.

The next set of steps use the Terraform orb to initialize and set up Terraform on our EKS cluster.

 - terraform/install:
    terraform_version: 1.2.5
- terraform/init:
    path: /home/circleci/project/eks
    backend_config: key=statefiles/cera/<< pipeline.git.branch >>
- terraform/plan:
    var: cluster_suffix=<< pipeline.parameters.cluster_suffix >><<^pipeline.parameters.is_main_branch>>-<< pipeline.git.branch >><</pipeline.parameters.is_main_branch>>
    path: /home/circleci/project/eks
- terraform/apply:
    var: cluster_suffix=<< pipeline.parameters.cluster_suffix >><<^pipeline.parameters.is_main_branch>>-<< pipeline.git.branch >><</pipeline.parameters.is_main_branch>>
    path: /home/circleci/project/eks
- run:
    name: Install Metrics Server (required for `kubectl top`)
    command: |
      aws eks update-kubeconfig --name $cluster_name
      kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Here’s what’s happening in this part of the config:

terraform/install: With the help of the Terraform orb, we can install Terraform on the execution environment with just two lines of code. Powerful stuff, right?
terraform/init: Like all Terraform deployments, our first step is to run an init. We will pass the path to our terraform resources and also the location of our statefile.
terraform/plan: The next step will be to execute a Terraform plan. In doing so we pass our cluster_suffix and the path to our Terraform resources.
terraform/apply: Now are are ready to execute terraform apply. We pass in the same resources used in the previous steps, and there you have it. With just this stanza, a Terraform apply will be executed.
run: In this run command we install Metrics Server on our cluster. Metrics Server is a scalable, efficient source of container resource metrics for Kubernetes autoscaling pipelines. It is required for kubectl top.

create-sa

Following Kubernetes and DevSecOps best practice, we also utilize Kubernetes RBAC capabilities to access our cluster. However it can be a daunting, painful and well, a mundane task for Platform teams to have to create these roles every time a cluster is created. This is where CircleCI’s features really come to the fore by leveraging computers’ most powerful capability—automation.

In this job we leverage CircleCI and Terraform to automate provisioning RBAC controls on our cluster.

create-sa:
    parameters:
      sa-name:
        type: string
      sa-namespace:
        type: string
    docker:
      - image: cimg/aws:2022.06.1
    environment:
      AWS_REGION: us-west-2
    steps:
      - checkout
      - enable-oidc
      - run:
          # Need to ensure cluster name is trimmed to not exceed 37 characters. See comment in eks/vpc.tf for more context.
          command: |
            cluster_name=$(echo "cera-<< pipeline.parameters.cluster_suffix >><<^pipeline.parameters.is_main_branch>>-<< pipeline.git.branch >><</pipeline.parameters.is_main_branch>>" | cut -c 1-37)
            aws eks update-kubeconfig --name $cluster_name \
            --alias << pipeline.parameters.cluster_suffix >>
          name: Pull kubeconfig
      - terraform/install:
          terraform_version: 1.2.5
      - terraform/init:
          path: /home/circleci/project/service-accounts
          backend_config: key=statefiles/cera/account-<< pipeline.git.branch >>
      - terraform/plan:
          var: name=<< parameters.sa-name >>,namespace=<< parameters.sa-namespace >>,context=<< pipeline.parameters.cluster_suffix >>
          path: /home/circleci/project/service-accounts
      - terraform/apply:
          var: name=<< parameters.sa-name >>,namespace=<< parameters.sa-namespace >>,context=<< pipeline.parameters.cluster_suffix >>
          path: /home/circleci/project/service-accounts
      - when: 
          condition: 
              equal: [ main, << pipeline.git.branch >> ]
          steps:
            - run:
                name: Update Credentials for App Teams
                command: bash .circleci/credential_updater.sh
                # Since feature branches create their own cluster, we dont want to update contexts
                # otherwise it would replace production credentials.
                # Testing BOA deployments to a dev cluster requires workarounds, but should be uncommon

The steps carried out in this job are very similar to those carried out in the provision-eks job; we just point to a different set of Terraform resources. One step to call out is the final when condition. As outlined in the comments of the code, since feature branches create their own cluster, we don’t want to up contexts. If we did, it would replace production contexts—and we certainly do not want to do that!

destroy-eks

Why would you want to destroy the provisioning of your dev clusters? It’s important to note here that the clusters provisioned from your feature branch are created solely to test the rollout of your EKS infrastructure. With that in mind, we can avoid wasting resources by tearing down the cluster after a successful deployment.

destroy-eks:
    docker:
      - image: cimg/aws:2022.06.1
    environment:
      AWS_REGION: us-west-2
    steps:
      - when:
          condition:
              equal: [ main, << pipeline.parameters.statefile_to_destroy >> ]
          steps:
            - run:
                command: |
                  echo "Cannot perform terraform destroy on CERA main cluster"
                  exit 1
      - checkout
      - enable-oidc
      - terraform/init:
          path: /home/circleci/project/eks
          backend_config: key=statefiles/cera/<< pipeline.git.branch >>
      - terraform/destroy:
          var: cluster_suffix=<< pipeline.parameters.cluster_suffix >><<^pipeline.parameters.is_main_branch>>-<< pipeline.git.branch >><</pipeline.parameters.is_main_branch>>
          path: /home/circleci/project/eks
      - run:
          command: |
            aws s3 rm s3://se-cluster-tf/statefiles/cera/<< pipeline.git.branch >>
            aws s3 rm s3://se-cluster-tf/statefiles/cera/account-<< pipeline.git.branch >>
          name: Deleting statefile

Once a feature branch cluster has been deployed, the manual authentication or steps in your process have been carried out to verify your cluster is up in running as expected, the resource-efficient thing to do is to tear that cluster down. In economically challenging times, where cutting cloud costs is top of the agenda for every team, CircleCI enables you to provision EKS infrastructure in a safe, secure, and cost efficient way.

Note that job execution is based on a conditional when statement. This ensures that we will only tear down a cluster if it is from a feature branch—a development cluster. Deployments that have gone through the policies we applied to our cluster, pull requests and code reviews to our main branch, represent the deployment of our production clusters. In this case, we do not want to tear down the cluster and instead exit the job with the error message Cannot perform terraform destroy on CERA main cluster.

Workflows

With the goal of enabling platform teams to deliver value to their customers faster, this project contains one single workflow, test-and-deploy-terraform to manage your EKS infrastructure. CircleCI’s dynamic configuration will then tell the workflow what environment to provision and, if necessary, destroy.

workflows:
  test-and-deploy-terraform:
    unless: << pipeline.parameters.statefile_to_destroy >>
    jobs:
      - provision-eks:
          context: reference-arch-aws-oidc
      - create-sa:
          name: Create-App-SA
          context: reference-arch-aws-oidc
          requires: [ provision-eks ]
      - hold:
          filters:
            branches:
              ignore: main
          type: approval
          requires: [ "Create-App-SA" ]
      - destroy-eks:
          filters:
            branches:
              ignore: main
          context: reference-arch-aws-oidc
          requires: [ hold ]

The setup config from step 1 will pass a parameters.json file to the continue_config.yaml. This JSON file will contain a variable that will determine what environment of EKS should be built.

test-and-deploy-terraform triggered from a feature branch

If the test-and-deploy-terraform workflow is triggered on a feature branch, it will run the following jobs (see the previous section for an explanation of each):

provision-eks
create-sa
hold
destroy-eks

build_from_feature

The goal of a build, test, and deploy on a feature branch is to deploy a development cluster that can be used for testing by platform and development teams. You will notice that once a development cluster is deployed, a CircleCI manual hold is leveraged.

- hold:
          filters:
            branches:
              ignore: main
          type: approval
          requires: [ "Create-App-SA" ]

This hold job will ensure that the destroy-eks job is not executed right away, enabling developers to run manual tests against the development cluster (authenticate, deploy sample apps, verify namespace creation, and so forth).

hold_from_feature

We also introduced a second terraform-destroy workflow to enable rapid and monitored teardown of our EKS infrastructure.

workflows:
terraform-destroy:
when: << pipeline.parameters.statefile_to_destroy >>
jobs:
- destroy-eks:
context: reference-arch-aws-oidc

test-and-deploy-terraform triggered from the main branch

If the test-and-deploy-terraform workflow is triggered on a feature branch, it will run the following jobs:

provision-eks
create-sa

build_from_main

Note that the workflow tells CircleCI to ignore the execution of the hold and destroy-eks jobs on our main branch, as a trigger to this branch will represent a deployment of our production cluster. We do not want to tear this cluster down post deployment.

destroy-eks:
filters:
branches:
ignore: main
context: reference-arch-aws-oidc
requires: [ hold ]

Conclusion

In this tutorial, you learned how to provision your EKS infrastructure in a resource-friendly manner using CircleCI’s dynamic configuration. We have explored various techniques, tips, and strategies to help you master GitOps on CircleCI and increase the agility and scalability of your infrastructure operations. You are now equipped with the knowledge and skills to confidently transform how you handle your infrastructure.

You can find both configuration files discussed in this tutorial in this GitHub Gist.

We encourage you to share your feedback and experiences with us on our GitOps support. Your input helps us improve our tutorials and ensure that we continue to provide valuable content that empowers our readers. Feel free to reach out with any questions, suggestions, or success stories. You are an integral part of our learning community.