Have you ever run an application through your CI/CD pipeline and seen all of the tests pass, only to have the application not function as expected when you deploy it to a live target environment? This situation is very common and plagues many teams - you can’t always anticipate what will happen when your application is pushed live. Smoke tests are designed to reveal these types of failures early by running test cases that cover the critical components and functionality of the application. They also ensure that the application will function as expected in a deployed scenario. When implemented, smoke tests are often executed on every application build to verify that basic but critical functionality passes before jumping into more extensive and time consuming testing. Smoke tests help to create fast feedback loops and are very useful in the software development life cycle.
In this post I’ll demonstrate how to add smoke testing to the deployment stage of a CI/CD pipeline which will test simple aspects of the application post deployment.
Technologies used
This post will reference the following technologies:
- GitHub
- CircleCI
- Docker
- Kubernetes
- Google Kubernetes Engine (GKE)
- Bash
- smoke.sh - open source smoke testing framework by asm89
- Pulumi
Prerequisites
This post relies on configurations and code that are featured in my previous post Automate releases from your pipelines using Infrastructure as Code. The full source code can be found in this repo.
Smoke tests
Smoke tests are great for exposing unexpected build errors, connection errors, and validating a server’s expected response after a new release is deployed to a target environment. For example, a quick, simple smoke test could validate that an application is accessible and is responding with expected response codes like OK 200
, 300
, 301
, 404
, etc. The examples in this post will test that the deployed app responds with an OK 200
server code and will also validate that the default page content renders the expected text.
CI/CD pipelines without smoke tests
Let’s take a look at an example pipeline config that is designed to run unit tests, build, and push a Docker image to Docker Hub. It also uses infrastructure as code (Pulumi) to provision a new Google Kubernetes Engine (GKE) cluster and deploy this release to the cluster. This pipeline config example does not implement smoke tests. Please be aware that if you run this specific pipeline example, a new GKE cluster will be created and will live on until you manually run the pulumi destroy
command which will terminate all the infrastructure it created.
Caution: Not terminating the infrastructure will result in unexpected costs.
version: 2.1
orbs:
pulumi: pulumi/pulumi@1.0.1
jobs:
build_test:
docker:
- image: circleci/python:3.7.2
environment:
PIPENV_VENV_IN_PROJECT: 'true'
steps:
- checkout
- run:
name: Install Python Dependencies
command: |
pipenv install --skip-lock
- run:
name: Run Tests
command: |
pipenv run pytest
build_push_image:
docker:
- image: circleci/python:3.7.2
steps:
- checkout
- setup_remote_docker:
docker_layer_caching: false
- run:
name: Build and push Docker image
command: |
pipenv install --skip-lock
pipenv run pip install --upgrade 'setuptools<45.0.0'
pipenv run pyinstaller -F hello_world.py
echo 'export TAG=${CIRCLE_SHA1}' >> $BASH_ENV
echo 'export IMAGE_NAME=orb-pulumi-gcp' >> $BASH_ENV
source $BASH_ENV
docker build -t $DOCKER_LOGIN/$IMAGE_NAME -t $DOCKER_LOGIN/$IMAGE_NAME:$TAG .
echo $DOCKER_PWD | docker login -u $DOCKER_LOGIN --password-stdin
docker push $DOCKER_LOGIN/$IMAGE_NAME
deploy_to_gcp:
docker:
- image: circleci/python:3.7.2
environment:
CLOUDSDK_PYTHON: '/usr/bin/python2.7'
GOOGLE_SDK_PATH: '~/google-cloud-sdk/'
steps:
- checkout
- pulumi/login:
access-token: ${PULUMI_ACCESS_TOKEN}
- run:
name: Install dependencies
command: |
cd ~/
sudo pip install --upgrade pip==18.0 && pip install --user -r project/reqs.txt
curl -o gcp-cli.tar.gz https://dl.google.com/dl/cloudsdk/channels/rapid/google-cloud-sdk.tar.gz
tar -xzvf gcp-cli.tar.gz
echo ${GOOGLE_CLOUD_KEYS} | base64 --decode --ignore-garbage > ${HOME}/project/pulumi/gcp/gke/cicd_demo_gcp_creds.json
./google-cloud-sdk/install.sh --quiet
echo 'export PATH=$PATH:~/google-cloud-sdk/bin' >> $BASH_ENV
source $BASH_ENV
gcloud auth activate-service-account --key-file ${HOME}/project/pulumi/gcp/gke/cicd_demo_gcp_creds.json
- pulumi/update:
stack: k8s
working_directory: ${HOME}/project/pulumi/gcp/gke/
workflows:
build_test_deploy:
jobs:
- build_test
- build_push_image:
requires:
- build_test
- deploy_to_gcp:
requires:
- build_push_image
This pipeline deploys the new app release to a new GKE cluster, but we do not know if the application is actually up and running after all of this automation completes. How do we quickly validate that the application has been deployed and is functioning properly in this new GKE cluster? Implementing smoke tests into your CI/CD pipeline is a great way to quickly and easily validate the application’s status after deployment.
How do I write a smoke test?
The first step in writing smoke tests is to develop test cases which define the steps required to validate an application’s functionality. Developing test cases is an exercise in identifying functionality that you want to validate, and then creating scenarios to test it. In this tutorial, I’m intentionally describing a very minimal scope for testing. In this situation, my biggest concern is validating that the application is accessible after deployment and that the default page that is served renders the expected static text.
Below is an example of how I developed test cases for this smoke test. I prefer to outline and list the items I want to test because it suits my style of development. The outline shows the factors I considered when developing the smoke tests for this app:
- What language/test framework?
- Bash
- smoke.sh
- When should this test be executed?
- After the GKE cluster has been created
- What will be tested?
- Test: Is the application accessible after it is deployed?
- Expected Result: Server responds with code
200
- Expected Result: Server responds with code
- Test: Does the default page render the text “Welcome to CI/CD”
- Expected Result:
TRUE
- Expected Result:
- Test: Does the default page render the text “Version Number: “
- Expected Results:
TRUE
- Expected Results:
- Test: Is the application accessible after it is deployed?
- Post test actions (must occur regardless of pass or fail)
- Write test results to standard output
- Destroy the GKE cluster and related infrastructure
- Run
pulumi destroy
- Run
My test case outline is complete for this tutorial and clearly shows what I’m interested in testing. It can also be referred to as a test script. For this post, I will write smoke tests using a bash-based, open source smoke test framework called smoke.sh
by asm89, but you can write smoke tests in what ever language or framework you desire. I picked smoke.sh
because it’s an easy framework to implement and it’s open source. Now let’s explore how to express this test script using the smoke.sh
framework.
Create smoke test using smoke.sh
The smoke.sh
framework’s documentation demonstrates how to use it. The code below shows how I used the smoke_test
file found in the test/
directory of the example code’s repo.
#!/bin/bash
. tests/smoke.sh
TIME_OUT=300
TIME_OUT_COUNT=0
PULUMI_STACK="k8s"
PULUMI_CWD="pulumi/gcp/gke/"
SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
SMOKE_URL="http://$SMOKE_IP"
while true
do
STATUS=$(curl -s -o /dev/null -w '%{http_code}' $SMOKE_URL)
if [ $STATUS -eq 200 ]; then
smoke_url_ok $SMOKE_URL
smoke_assert_body "Welcome to CI/CD"
smoke_assert_body "Version Number:"
smoke_report
echo "\n\n"
echo 'Smoke Tests Successfully Completed.'
echo 'Terminating the Kubernetes Cluster in 300 second...'
sleep 300
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
break
elif [[ $TIME_OUT_COUNT -gt $TIME_OUT ]]; then
echo "Process has Timed out! Elapsed Timeout Count.. $TIME_OUT_COUNT"
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
exit 1
else
echo "Checking Status on host $SMOKE... $TIME_OUT_COUNT seconds elapsed"
TIME_OUT_COUNT=$((TIME_OUT_COUNT+10))
fi
sleep 10
done
Next, I’ll explain what’s going on in this smoke_test file.
smoke_test file breakdown
Let’s start at the top of the file.
#!/bin/bash
. tests/smoke.sh
The snippet above specifies the Bash binary to use and also specifies the file path to the core smoke.sh
framework to import/include in the smoke_test
script.
TIME_OUT=300
TIME_OUT_COUNT=0
PULUMI_STACK="k8s"
PULUMI_CWD="pulumi/gcp/gke/"
SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
SMOKE_URL="http://$SMOKE_IP"
The snippet above defines environment variables that will be used throughout the smoke_test
script. The list of environment variables below explain their purpose:
PULUMI_STACK="k8s"
- Used by pulumi to specify the pulumi app stack.PULUMI_CWD="pulumi/gcp/gke/"
- The path to the pulumi infrastructure code.SMOKE_IP=$(pulumi stack --stack $PULUMI_STACK --cwd $PULUMI_CWD output app_endpoint_ip)
- The Pulumi command used to retrieve the public IP address of the application on the GKE cluster. This variable is referenced throughout the script.SMOKE_URL="http://$SMOKE_IP"
- Specifies the url endpoint of the application on the GKE cluster.
while true
do
STATUS=$(curl -s -o /dev/null -w '%{http_code}' $SMOKE_URL)
if [ $STATUS -eq 200 ]; then
smoke_url_ok $SMOKE_URL
smoke_assert_body "Welcome to CI/CD"
smoke_assert_body "Version Number:"
smoke_report
echo "\n\n"
echo 'Smoke Tests Successfully Completed.'
echo 'Terminating the Kubernetes Cluster in 300 second...'
sleep 300
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
break
elif [[ $TIME_OUT_COUNT -gt $TIME_OUT ]]; then
echo "Process has Timed out! Elapsed Timeout Count.. $TIME_OUT_COUNT"
pulumi destroy --stack $PULUMI_STACK --cwd $PULUMI_CWD --yes
exit 1
else
echo "Checking Status on host $SMOKE... $TIME_OUT_COUNT seconds elapsed"
TIME_OUT_COUNT=$((TIME_OUT_COUNT+10))
fi
sleep 10
done
The snippet above is where all the magic happens. It’s a while
loop that executes until a condition is true or the script exits. In this case, the loop uses a curl
command to test if the application returns an OK 200
response code. Now since this pipeline is creating a brand new GKE cluster from scratch, there are transactions occurring in the Google Cloud Platform that take time to complete before we begin smoke testing. The first thing that needs to occur is the GKE cluster and application service must be up and running. The $STATUS
variable is populated with the results of the curl requests then tested for the value of 200
. Otherwise, the loop increments the $TIME_OUT_COUNT
variable by 10 seconds, then waits for 10 seconds to repeat the curl
request until the application is responding. Once the cluster and app are up, running, and responding, the STATUS
variable will produce a 200
response code and the remainder of the tests will proceed.
The smoke_assert_body "Welcome to CI/CD"
and smoke_assert_body "Version Number: "
statements are where I test that the welcome and version number texts are being rendered on the webpage being called. If the result is false, the test will fail which will fail the pipeline. If the result is true, then the application will return a 200
response code and our text tests will result in TRUE
. This will then result in our smoke test passing and finally executing the pulumi destroy
command which terminates all of the infrastructure created for this test case. Since there is no further need for this cluster it will terminate all the infrastructure created in this test.
This loop also has an elif
(else if) statement that checks to see if the application has exceeded the $TIME_OUT
value. The elif
statement is an example of exception handling which enables us to control what happens when unexpected results occur. If the $TIME_OUT_COUNT
value exceeds the TIME_OUT
value then the pulumi destroy
command is executed and terminates the newly created infrastructure and the exit 1
command fails your pipeline build process. Regardless of test results, the GKE cluster will be terminated because there really isn’t a need for this infrastructure to exist outside of testing.
Adding smoke tests to pipelines
I’ve explained the smoke test example and my process for developing the test case. Now it’s time to integrate it into the CI/CD pipeline configuration above. We’ll add a new run
step below the pulumi/update
step of the deploy_to_gcp
job:
...
- run:
name: Run Smoke Test against GKE
command: |
echo 'Initializing Smoke Tests on the GKE Cluster'
./tests/smoke_test
echo "GKE Cluster Tested & Destroyed"
...
The snippet above demonstrates how to integrate and execute the smoke_test
script into an existing CI/CD pipeline. Adding this new run block to the pipeline will now ensure that every pipeline build will test the application on a live GKE cluster and provide a validation that the application passed all test cases. You can be confident that the specific release will perform nominally when deployed to the tested target environment which in this case, is a Google Kubernetes cluster.
Wrapping up
In summary, I’ve discussed and demonstrated the advantages of leveraging smoke tests and infrastructure as code within CI/CD pipelines to test builds in their target deployment environments. Testing an application in its target environment provides valuable insight into how it will behave when it’s deployed to that same target environment. Smoke testing implemented in CI/CD pipelines adds another layer of confidence in application builds.
If you have any questions, comments, or feedback please feel free to ping me on Twitter @punkdata.
Thanks for reading!