Here’s a common situation that plagues many development teams. You run an application through your CI/CD pipeline and all of the tests pass, which is great. But when you deploy it to a live target environment the application just does not function as expected. You can’t always predict what will happen when your application is pushed live. The solution? Smoke tests are designed to reveal these types of failures early by running test cases that cover the critical components and functionality of the application. They also ensure that the application will function as expected in a deployed scenario. When implemented, smoke tests are often executed on every application build to verify that basic but critical functionality passes before jumping into more extensive and time-consuming testing. Smoke tests help create the fast feedback loops that are vital to the software development life cycle.
In this post I’ll demonstrate how to add smoke testing to the deployment stage of a CI/CD pipeline. The smoke testing will test simple aspects of the application post deployment.
Technologies used for smoke testing
This post will reference the following technologies:
- GitHub
- CircleCI
- Docker
- Kubernetes
- Google Kubernetes Engine (GKE)
- Bash
- smoke.sh - open source smoke testing framework by asm89
- Pulumi
Prerequisites
This post relies on configurations and code that are featured in my previous post Automate releases from your pipelines using Infrastructure as Code. The full source code can be found in this repo.
Getting the most from smoke tests
Smoke tests are great for exposing unexpected build errors, connection errors, and for validating a server’s expected response after a new release is deployed to a target environment. For example, a quick, simple smoke test could validate that an application is accessible and is responding with expected response codes like OK 200
, 300
, 301
, 404
, etc. The examples in this post will test that the deployed app responds with an OK 200
server code and will also validate that the default page content renders the expected text.
Running CI/CD pipelines without smoke tests
Let’s take a look at an example pipeline config that is designed to run unit tests, build, and push a Docker image to Docker Hub. The pipeline also uses infrastructure as code (Pulumi) to provision a new Google Kubernetes Engine (GKE) cluster and to deploy this release to the cluster. This pipeline config example does not implement smoke tests. Please be aware that if you run this specific pipeline example, a new GKE cluster will be created and will live on until you manually run the pulumi destroy
command to terminate all the infrastructure it created.
Caution: Not terminating the infrastructure will result in unexpected costs.
version: 2.1
orbs:
pulumi: pulumi/pulumi@2.0.0
jobs:
build_test:
docker:
- image: cimg/python:3.8.1
environment:
PIPENV_VENV_IN_PROJECT: 'true'
steps:
- checkout
- run:
name: Install Python Dependencies
command: |
pipenv install --skip-lock
- run:
name: Run Tests
command: |
pipenv run pytest
build_push_image:
docker:
- image: cimg/python:3.8.1
steps:
- checkout
- setup_remote_docker:
docker_layer_caching: false
- run:
name: Build and push Docker image
command: |
pipenv install --skip-lock
pipenv run pip install --upgrade 'setuptools<45.0.0'
pipenv run pyinstaller -F hello_world.py
echo 'export TAG=${CIRCLE_SHA1}' >> $BASH_ENV
echo 'export IMAGE_NAME=orb-pulumi-gcp' >> $BASH_ENV
source $BASH_ENV
docker build -t $DOCKER_LOGIN/$IMAGE_NAME -t $DOCKER_LOGIN/$IMAGE_NAME:$TAG .
echo $DOCKER_PWD | docker login -u $DOCKER_LOGIN --password-stdin
docker push $DOCKER_LOGIN/$IMAGE_NAME
deploy_to_gcp:
docker:
- image: cimg/python:3.8.1
environment:
CLOUDSDK_PYTHON: '/usr/bin/python2.7'
GOOGLE_SDK_PATH: '~/google-cloud-sdk/'
steps:
- checkout
- pulumi/login:
version: "2.0.0"
access-token: ${PULUMI_ACCESS_TOKEN}
- run:
name: Install dependencies
command: |
cd ~/
pip install --user -r project/requirements.txt
curl -o gcp-cli.tar.gz https://dl.google.com/dl/cloudsdk/channels/rapid/google-cloud-sdk.tar.gz
tar -xzvf gcp-cli.tar.gz
echo ${GOOGLE_CLOUD_KEYS} | base64 --decode --ignore-garbage > ${HOME}/project/pulumi/gcp/gke/cicd_demo_gcp_creds.json
./google-cloud-sdk/install.sh --quiet
echo 'export PATH=$PATH:~/google-cloud-sdk/bin' >> $BASH_ENV
source $BASH_ENV
gcloud auth activate-service-account --key-file ${HOME}/project/pulumi/gcp/gke/cicd_demo_gcp_creds.json
- pulumi/update:
stack: k8s
working_directory: ${HOME}/project/pulumi/gcp/gke/
workflows:
build_test_deploy:
jobs:
- build_test
- build_push_image
- deploy_to_gcp:
requires:
- build_test
- build_push_image
This pipeline deploys the new app release to a new GKE cluster, but we do not know if the application is actually up and running after all of this automation completes. How do we find out whether the application has been deployed and is functioning properly in this new GKE cluster? Smoke tests are a great way to quickly and easily validate the application’s status after deployment.
How do I write a smoke test?
The first step is to develop test cases that define the steps required to validate an application’s functionality. Identify the functionality that you want to validate, and then create scenarios to test it. In this tutorial, I’m intentionally describing a very minimal scope for testing. For our sample project, my biggest concern is validating that the application is accessible after deployment and that the default page that is served renders the expected static text.
I prefer to outline and list the items I want to test because it suits my style of development. The outline shows the factors I considered when developing the smoke tests for this app. Here is an example of how I developed test cases for this smoke test:
- What language/test framework?
- Bash
- smoke.sh
- When should this test be executed?
- After the GKE cluster has been created
- What will be tested?
- Test: Is the application accessible after it is deployed?
- Expected Result: Server responds with code
200
- Expected Result: Server responds with code
- Test: Does the default page render the text “Welcome to CI/CD”
- Expected Result:
TRUE
- Expected Result:
- Test: Does the default page render the text “Version Number: “
- Expected Results:
TRUE
- Expected Results:
- Test: Is the application accessible after it is deployed?
- Post test actions (must occur regardless of pass or fail)
- Write test results to standard output
- Destroy the GKE cluster and related infrastructure
- Run
pulumi destroy
- Run
My test case outline (also called a test script) is complete for this tutorial and clearly shows what I’m interested in testing. For this post, I will write smoke tests using a bash-based, open source smoke test framework called smoke.sh
by asm89. For your own projects, you can write smoke tests in what ever language or framework you prefer. I picked smoke.sh
because it’s an easy framework to implement and it’s open source. Now let’s explore how to express this test script using the smoke.sh
framework.
Create smoke test using smoke.sh
The smoke.sh
framework’s documentation describes how to use it. The next block of sample code shows how I used the smoke_test
file found in the test/
directory of the example code’s repo.
#!/bin/bash
. tests/smoke.sh
TIME_OUT=300
TIME_OUT_COUNT=0
PULUMI_STACK="k8s"
PULUMI_CWD="pulumi/gcp/gke/"
SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
SMOKE_URL="http://$SMOKE_IP"
while true
do
STATUS=$(curl -s -o /dev/null -w '%{http_code}' $SMOKE_URL)
if [ $STATUS -eq 200 ]; then
smoke_url_ok $SMOKE_URL
smoke_assert_body "Welcome to CI/CD"
smoke_assert_body "Version Number:"
smoke_report
echo "\n\n"
echo 'Smoke Tests Successfully Completed.'
echo 'Terminating the Kubernetes Cluster in 300 second...'
sleep 300
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
break
elif [[ $TIME_OUT_COUNT -gt $TIME_OUT ]]; then
echo "Process has Timed out! Elapsed Timeout Count.. $TIME_OUT_COUNT"
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
exit 1
else
echo "Checking Status on host $SMOKE... $TIME_OUT_COUNT seconds elapsed"
TIME_OUT_COUNT=$((TIME_OUT_COUNT+10))
fi
sleep 10
done
Next, I’ll explain what’s going on in this smoke_test file.
Line by line description of the smoke_test file
Let’s start at the top of the file.
#!/bin/bash
. tests/smoke.sh
This snippet specifies the Bash binary to use and also specifies the file path to the core smoke.sh
framework to import/include in the smoke_test
script.
TIME_OUT=300
TIME_OUT_COUNT=0
PULUMI_STACK="k8s"
PULUMI_CWD="pulumi/gcp/gke/"
SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
SMOKE_URL="http://$SMOKE_IP"
This snippet defines environment variables that will be used throughout the smoke_test
script. Here is a list of each environment variable and its purpose:
PULUMI_STACK="k8s"
is used by Pulumi to specify the Pulumi app stack.PULUMI_CWD="pulumi/gcp/gke/"
is the path to the Pulumi infrastructure code.SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
is the Pulumi command used to retrieve the public IP address of the application on the GKE cluster. This variable is referenced throughout the script.SMOKE_URL="http://$SMOKE_IP"
specifies the url endpoint of the application on the GKE cluster.
while true
do
STATUS=$(curl -s -o /dev/null -w '%{http_code}' $SMOKE_URL)
if [ $STATUS -eq 200 ]; then
smoke_url_ok $SMOKE_URL
smoke_assert_body "Welcome to CI/CD"
smoke_assert_body "Version Number:"
smoke_report
echo "\n\n"
echo 'Smoke Tests Successfully Completed.'
echo 'Terminating the Kubernetes Cluster in 300 second...'
sleep 300
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
break
elif [[ $TIME_OUT_COUNT -gt $TIME_OUT ]]; then
echo "Process has Timed out! Elapsed Timeout Count.. $TIME_OUT_COUNT"
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
exit 1
else
echo "Checking Status on host $SMOKE... $TIME_OUT_COUNT seconds elapsed"
TIME_OUT_COUNT=$((TIME_OUT_COUNT+10))
fi
sleep 10
done
This snippet is where all the magic happens. It’s a while
loop that executes until a condition is true or the script exits. In this case, the loop uses a curl
command to test if the application returns an OK 200
response code. Because this pipeline is creating a brand new GKE cluster from scratch, there are transactions in the Google Cloud Platform that need to be complete before we begin smoke testing.
- The GKE cluster and application service must be up and running.
- The
$STATUS
variable is populated with the results of the curl requests then tested for the value of200
. Otherwise, the loop increments the$TIME_OUT_COUNT
variable by 10 seconds, then waits for 10 seconds to repeat thecurl
request until the application is responding. - Once the cluster and app are up, running, and responding, the
STATUS
variable will produce a200
response code and the remainder of the tests will proceed.
The smoke_assert_body "Welcome to CI/CD"
and smoke_assert_body "Version Number: "
statements are where I test that the welcome and version number texts are being rendered on the webpage being called. If the result is false, the test will fail, which will cause the pipeline to fail. If the result is true, then the application will return a 200
response code and our text tests will result in TRUE
. Our smoke test will pass and execute the pulumi destroy
command that terminates all of the infrastructure created for this test case. Since there is no further need for this cluster, it will terminate all the infrastructure created in this test.
This loop also has an elif
(else if) statement that checks to see if the application has exceeded the $TIME_OUT
value. The elif
statement is an example of exception handling which enables us to control what happens when unexpected results occur. If the $TIME_OUT_COUNT
value exceeds the TIME_OUT
value, then the pulumi destroy
command is executed and terminates the newly created infrastructure. The exit 1
command then fails your pipeline build process. Regardless of test results, the GKE cluster will be terminated because there really isn’t a need for this infrastructure to exist outside of testing.
Adding smoke tests to pipelines
I’ve explained the smoke test example and my process for developing the test case. Now it’s time to integrate it into the CI/CD pipeline configuration above. We’ll add a new run
step below the pulumi/update
step of the deploy_to_gcp
job:
...
- run:
name: Run Smoke Test against GKE
command: |
echo 'Initializing Smoke Tests on the GKE Cluster'
./tests/smoke_test
echo "GKE Cluster Tested & Destroyed"
...
This snippet demonstrates how to integrate and execute the smoke_test
script into an existing CI/CD pipeline. Adding this new run block ensures that every pipeline build will test the application on a live GKE cluster and provide a validation that the application passed all test cases. You can be confident that the specific release will perform nominally when deployed to the tested target environment which in this case, is a Google Kubernetes cluster.
Wrapping up
In summary, I’ve discussed and demonstrated the advantages of using smoke tests and Infrastructure as Code in CI/CD pipelines to test builds in their target deployment environments. Testing an application in its target environment provides valuable insight into how it will behave when it’s deployed. Integrating smoke testing into CI/CD pipelines adds another layer of confidence in application builds.
If you have any questions, comments, or feedback please feel free to ping me on Twitter @punkdata.
Thanks for reading!