Smoke testing in CI/CD pipelines

Here’s a common situation that plagues many development teams. You run an application through your CI/CD pipeline and all of the tests pass, which is great. But when you deploy it to a live target environment the application just does not function as expected. You can’t always predict what will happen when your application is pushed live. The solution? Smoke tests are designed to reveal these types of failures early by running test cases that cover the critical components and functionality of the application. They also ensure that the application will function as expected in a deployed scenario. When implemented, smoke tests are often executed on every application build to verify that basic but critical functionality passes before jumping into more extensive and time-consuming testing. Smoke tests help create the fast feedback loops that are vital to the software development life cycle.

In this post I’ll demonstrate how to add smoke testing to the deployment stage of a CI/CD pipeline. The smoke testing will test simple aspects of the application post deployment.

Technologies used for smoke testing

This post will reference the following technologies:

Prerequisites

This post relies on configurations and code that are featured in my previous post Automate releases from your pipelines using Infrastructure as Code. The full source code can be found in this repo.

Getting the most from smoke tests

Smoke tests are great for exposing unexpected build errors, connection errors, and for validating a server’s expected response after a new release is deployed to a target environment. For example, a quick, simple smoke test could validate that an application is accessible and is responding with expected response codes like OK 200, 300, 301, 404, etc. The examples in this post will test that the deployed app responds with an OK 200 server code and will also validate that the default page content renders the expected text.

Running CI/CD pipelines without smoke tests

Let’s take a look at an example pipeline config that is designed to run unit tests, build, and push a Docker image to Docker Hub. The pipeline also uses infrastructure as code (Pulumi) to provision a new Google Kubernetes Engine (GKE) cluster and to deploy this release to the cluster. This pipeline config example does not implement smoke tests. Please be aware that if you run this specific pipeline example, a new GKE cluster will be created and will live on until you manually run the pulumi destroy command to terminate all the infrastructure it created.

Caution: Not terminating the infrastructure will result in unexpected costs.

version: 2.1
orbs:
  pulumi: pulumi/pulumi@2.0.0
jobs:
  build_test:
    docker:
      - image: cimg/python:3.8.1
        environment:
          PIPENV_VENV_IN_PROJECT: 'true'
    steps:
      - checkout
      - run:
          name: Install Python Dependencies
          command: |
            pipenv install --skip-lock
      - run:
          name: Run Tests
          command: |
            pipenv run pytest
  build_push_image:
    docker:
      - image: cimg/python:3.8.1
    steps:
      - checkout
      - setup_remote_docker:
          docker_layer_caching: false
      - run:
          name: Build and push Docker image
          command: |       
            pipenv install --skip-lock
            pipenv run pip install --upgrade 'setuptools<45.0.0'
            pipenv run pyinstaller -F hello_world.py
            echo 'export TAG=${CIRCLE_SHA1}' >> $BASH_ENV
            echo 'export IMAGE_NAME=orb-pulumi-gcp' >> $BASH_ENV
            source $BASH_ENV
            docker build -t $DOCKER_LOGIN/$IMAGE_NAME -t $DOCKER_LOGIN/$IMAGE_NAME:$TAG .
            echo $DOCKER_PWD | docker login -u $DOCKER_LOGIN --password-stdin
            docker push $DOCKER_LOGIN/$IMAGE_NAME
  deploy_to_gcp:
    docker:
      - image: cimg/python:3.8.1
        environment:
          CLOUDSDK_PYTHON: '/usr/bin/python2.7'
          GOOGLE_SDK_PATH: '~/google-cloud-sdk/'
    steps:
      - checkout
      - pulumi/login:
          version: "2.0.0"
          access-token: ${PULUMI_ACCESS_TOKEN}
      - run:
          name: Install dependencies
          command: |
            cd ~/
            pip install --user -r project/requirements.txt
            curl -o gcp-cli.tar.gz https://dl.google.com/dl/cloudsdk/channels/rapid/google-cloud-sdk.tar.gz
            tar -xzvf gcp-cli.tar.gz
            echo ${GOOGLE_CLOUD_KEYS} | base64 --decode --ignore-garbage > ${HOME}/project/pulumi/gcp/gke/cicd_demo_gcp_creds.json
            ./google-cloud-sdk/install.sh  --quiet
            echo 'export PATH=$PATH:~/google-cloud-sdk/bin' >> $BASH_ENV
            source $BASH_ENV
            gcloud auth activate-service-account --key-file ${HOME}/project/pulumi/gcp/gke/cicd_demo_gcp_creds.json
      - pulumi/update:
          stack: k8s
          working_directory: ${HOME}/project/pulumi/gcp/gke/
workflows:
  build_test_deploy:
    jobs:
      - build_test
      - build_push_image
      - deploy_to_gcp:
          requires:
          - build_test
          - build_push_image

This pipeline deploys the new app release to a new GKE cluster, but we do not know if the application is actually up and running after all of this automation completes. How do we find out whether the application has been deployed and is functioning properly in this new GKE cluster? Smoke tests are a great way to quickly and easily validate the application’s status after deployment.

How do I write a smoke test?

The first step is to develop test cases that define the steps required to validate an application’s functionality. Identify the functionality that you want to validate, and then create scenarios to test it. In this tutorial, I’m intentionally describing a very minimal scope for testing. For our sample project, my biggest concern is validating that the application is accessible after deployment and that the default page that is served renders the expected static text.

I prefer to outline and list the items I want to test because it suits my style of development. The outline shows the factors I considered when developing the smoke tests for this app. Here is an example of how I developed test cases for this smoke test:

What language/test framework?
- Bash
- smoke.sh
When should this test be executed?
- After the GKE cluster has been created
What will be tested?
- Test: Is the application accessible after it is deployed?
  - Expected Result: Server responds with code 200
- Test: Does the default page render the text “Welcome to CI/CD”
  - Expected Result: TRUE
- Test: Does the default page render the text “Version Number: “
  - Expected Results: TRUE
Post test actions (must occur regardless of pass or fail)
- Write test results to standard output
- Destroy the GKE cluster and related infrastructure
  - Run pulumi destroy

My test case outline (also called a test script) is complete for this tutorial and clearly shows what I’m interested in testing. For this post, I will write smoke tests using a bash-based, open source smoke test framework called smoke.sh by asm89. For your own projects, you can write smoke tests in what ever language or framework you prefer. I picked smoke.sh because it’s an easy framework to implement and it’s open source. Now let’s explore how to express this test script using the smoke.sh framework.

Create smoke test using smoke.sh

The smoke.sh framework’s documentation describes how to use it. The next block of sample code shows how I used the smoke_test file found in the test/ directory of the example code’s repo.

#!/bin/bash

. tests/smoke.sh

TIME_OUT=300
TIME_OUT_COUNT=0
PULUMI_STACK="k8s"
PULUMI_CWD="pulumi/gcp/gke/"
SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
SMOKE_URL="http://$SMOKE_IP"

while true
do
  STATUS=$(curl -s -o /dev/null -w '%{http_code}' $SMOKE_URL)
  if [ $STATUS -eq 200 ]; then
    smoke_url_ok $SMOKE_URL
    smoke_assert_body "Welcome to CI/CD"
    smoke_assert_body "Version Number:"
    smoke_report
    echo "\n\n"
    echo 'Smoke Tests Successfully Completed.'
    echo 'Terminating the Kubernetes Cluster in 300 second...'
    sleep 300
    pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
    break
  elif [[ $TIME_OUT_COUNT -gt $TIME_OUT ]]; then
    echo "Process has Timed out! Elapsed Timeout Count.. $TIME_OUT_COUNT"
    pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
    exit 1
  else
    echo "Checking Status on host $SMOKE... $TIME_OUT_COUNT seconds elapsed"
    TIME_OUT_COUNT=$((TIME_OUT_COUNT+10))
  fi
  sleep 10
done

Next, I’ll explain what’s going on in this smoke_test file.

Line by line description of the smoke_test file

Let’s start at the top of the file.

#!/bin/bash

. tests/smoke.sh

This snippet specifies the Bash binary to use and also specifies the file path to the core smoke.sh framework to import/include in the smoke_test script.

TIME_OUT=300
TIME_OUT_COUNT=0
PULUMI_STACK="k8s"
PULUMI_CWD="pulumi/gcp/gke/"
SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
SMOKE_URL="http://$SMOKE_IP"

This snippet defines environment variables that will be used throughout the smoke_test script. Here is a list of each environment variable and its purpose:

PULUMI_STACK="k8s" is used by Pulumi to specify the Pulumi app stack.
PULUMI_CWD="pulumi/gcp/gke/" is the path to the Pulumi infrastructure code.
SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip) is the Pulumi command used to retrieve the public IP address of the application on the GKE cluster. This variable is referenced throughout the script.
SMOKE_URL="http://$SMOKE_IP" specifies the url endpoint of the application on the GKE cluster.

while true
do
  STATUS=$(curl -s -o /dev/null -w '%{http_code}' $SMOKE_URL)
  if [ $STATUS -eq 200 ]; then
    smoke_url_ok $SMOKE_URL
    smoke_assert_body "Welcome to CI/CD"
    smoke_assert_body "Version Number:"
    smoke_report
    echo "\n\n"
    echo 'Smoke Tests Successfully Completed.'
    echo 'Terminating the Kubernetes Cluster in 300 second...'
    sleep 300
    pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
    break
  elif [[ $TIME_OUT_COUNT -gt $TIME_OUT ]]; then
    echo "Process has Timed out! Elapsed Timeout Count.. $TIME_OUT_COUNT"
    pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
    exit 1
  else
    echo "Checking Status on host $SMOKE... $TIME_OUT_COUNT seconds elapsed"
    TIME_OUT_COUNT=$((TIME_OUT_COUNT+10))
  fi
  sleep 10
done

This snippet is where all the magic happens. It’s a while loop that executes until a condition is true or the script exits. In this case, the loop uses a curl command to test if the application returns an OK 200 response code. Because this pipeline is creating a brand new GKE cluster from scratch, there are transactions in the Google Cloud Platform that need to be complete before we begin smoke testing.

The GKE cluster and application service must be up and running.
The $STATUS variable is populated with the results of the curl requests then tested for the value of 200. Otherwise, the loop increments the $TIME_OUT_COUNT variable by 10 seconds, then waits for 10 seconds to repeat the curl request until the application is responding.
Once the cluster and app are up, running, and responding, the STATUS variable will produce a 200 response code and the remainder of the tests will proceed.

The smoke_assert_body "Welcome to CI/CD" and smoke_assert_body "Version Number: " statements are where I test that the welcome and version number texts are being rendered on the webpage being called. If the result is false, the test will fail, which will cause the pipeline to fail. If the result is true, then the application will return a 200 response code and our text tests will result in TRUE. Our smoke test will pass and execute the pulumi destroy command that terminates all of the infrastructure created for this test case. Since there is no further need for this cluster, it will terminate all the infrastructure created in this test.

This loop also has an elif (else if) statement that checks to see if the application has exceeded the $TIME_OUT value. The elif statement is an example of exception handling which enables us to control what happens when unexpected results occur. If the $TIME_OUT_COUNT value exceeds the TIME_OUT value, then the pulumi destroy command is executed and terminates the newly created infrastructure. The exit 1 command then fails your pipeline build process. Regardless of test results, the GKE cluster will be terminated because there really isn’t a need for this infrastructure to exist outside of testing.

Adding smoke tests to pipelines

I’ve explained the smoke test example and my process for developing the test case. Now it’s time to integrate it into the CI/CD pipeline configuration above. We’ll add a new run step below the pulumi/update step of the deploy_to_gcp job:

      ...
      - run:
          name: Run Smoke Test against GKE
          command: |
            echo 'Initializing Smoke Tests on the GKE Cluster'
            ./tests/smoke_test
            echo "GKE Cluster Tested & Destroyed"
      ...

This snippet demonstrates how to integrate and execute the smoke_test script into an existing CI/CD pipeline. Adding this new run block ensures that every pipeline build will test the application on a live GKE cluster and provide a validation that the application passed all test cases. You can be confident that the specific release will perform nominally when deployed to the tested target environment which in this case, is a Google Kubernetes cluster.

Wrapping up

In summary, I’ve discussed and demonstrated the advantages of using smoke tests and Infrastructure as Code in CI/CD pipelines to test builds in their target deployment environments. Testing an application in its target environment provides valuable insight into how it will behave when it’s deployed. Integrating smoke testing into CI/CD pipelines adds another layer of confidence in application builds.

If you have any questions, comments, or feedback please feel free to ping me on Twitter @punkdata.

Thanks for reading!