Automatically scale self-hosted runners in AWS to meet demand
Senior Product Manager
Self-hosted runners allow you to host your own scalable execution environments in your private cloud or on-premises, giving you more flexibility to customize and control your CI/CD infrastructure. Teams with unique security or compute requirements can set up and start using self-hosted runners in under five minutes. After setup, your team has access to a range of popular features available on the CircleCI Cloud platform, including parallelism and test splitting, debugging with SSH, and managing self-hosted runners directly in the CircleCI UI.
Most teams experience fluctuations in their resource demands throughout the workday, and maintaining unused compute capacity can result in unnecessary costs. To prevent this, you can implement a scaling solution to automatically spin up and tear down self-hosted runners in response to the number of jobs in the queue, giving you on-demand access to the compute power you need without the risk of wasting money on idle resources. In this tutorial, you will learn how to set up a basic auto-scaling solution for CircleCI’s self-hosted runners using AWS Auto Scaling groups (ASG). If you are interested in auto-scaling runners within your Kubernetes cluster, you can also check out our container runner/) option.
Self-hosted runners — execution environments under your full control
Self-hosted runners provide a fully customizable execution environment. When using self-hosted runners, CircleCI sends jobs to your compute to execute the required CI/CD steps. If your application needs access to internal databases or sensitive resources for proper testing, you can deploy it to self-hosted runners behind your firewall.
Runners are designed to be as easy to configure, manage, and deploy as possible. The following illustration provides some detail on runner behavior that is relevant to the solution you will implement in this tutorial. More information on self-hosted runners is available on the FAQ and runner concepts pages.
- Resource classes: Self-hosted runners are grouped into uniquely named
resource classes
that are used for identification and to assign jobs. All self-hosted runner resource classes must have a unique name to ensure runners can be properly identified and managed from within the CircleCI web interface. - Compute consistency: It is a best practice to keep the underlying compute for self-hosted runners consistent within resource classes — each machine should be identically configured with the same architecture and environment. This prevents any unexpected behavior and makes troubleshooting easier. In this tutorial, we will use an EC2 template to ensure runner agents are configured uniformly.
- Job queueing: If no runner is available, the CircleCI job will wait until a self-hosted runner of the required resource class is available.This gives your self-hosted runners time to launch if you have implemented an auto-scaling solution
- Connectivity requirements: Runners poll CircleCI for new jobs, so no incoming connections from the internet are required.This means you do not need to configure any incoming firewall rules or port-forwarding, and the runners do not require a static public IP address.
Now that you are familiar with the basics of using CircleCI’s self-hosted runners, let’s get started with the tutorial.
Auto-scaling self-hosted runners with AWS Auto Scaling groups
This example implementation demonstrates how you can scale self-hosted runners using AWS Auto Scaling functionality to increase and decrease the number of available self-hosted runners based on demand.
The example repository includes a basic Node.js app and a CircleCI pipeline configuration to test it. To execute this CircleCI pipeline, you will set up a self-hosted runner as an AWS EC2 launch template based on Ubuntu. The launch template and Auto Scaling group will be used to launch instances based on the queue depth (the number of jobs in the queue) value provided by the runner API for a given runner resource class — all triggered by a Lambda function that checks the API periodically.
Setting up a runner resource class in CircleCI
The first step is to create a resource class. You can do this with one click in the CircleCI UI.
Once you have created the resource class, take note of the authentication token generated for it. It will not be shown again. You will need this token in the next step to authenticate the self-hosted runner with CircleCI.
Preparing the self-hosted runner installation script
Next, you will need to create an installation script to automatically install and configure self-hosted runners in AWS. When an AWS instance is launched from the template created in the next step, this Bash script will be invoked as the root user after it has finished booting. You can find the template for this script in the example repository in the file aws_config/install_runner_ubuntu.sh.
Using the resource class and token from the previous step, update the following variables in the script template:
- RUNNER_NAME: Can be whatever alpha-numeric name you want to give to your runner as it will appear in the CircleCI UI.
- AUTH_TOKEN: Should be replaced with the resource class token that was presented in the UI during the creation of your resource class.
You will then need to add steps to install any dependencies or packages you’d like as part of the execution environment when a job is running. This must be done in the script before the runner service is enabled and started.
For example, if you are developing and testing a Node.js app, you will want to add steps to the script to install Node.js.
# aws_config/install_runner_ubuntu.sh
#------------------------------------------------------------------------------
# Configure your runner environment
# This script must be able to run unattended - without user input
#------------------------------------------------------------------------------
apt install -y nodejs npm
Since the script will be executed at boot for each instance that is created in the Auto Scaling group, it must be able to be run unattended (without user input).
In the installation script, take note of the short (1m
) idle_timeout time for the CircleCI runner. This helps with scaling down self-hosted runners and instances that are no longer needed.
# aws_config/install_runner_ubuntu.sh
#------------------------------------------------------------------------------
# Install the CircleCI runner configuration
# CircleCI Runner will be executing as the configured $USERNAME
# Note the short idle timeout - this script is designed for auto-scaling scenarios - if a runner is unclaimed, it will quit and the system will shut down as defined in the below service definition
#------------------------------------------------------------------------------
cat << EOF >$CONFIG_PATH
api:
auth_token: $AUTH_TOKEN
runner:
name: $UNIQUE_RUNNER_NAME
command_prefix: ["sudo", "-niHu", "$USERNAME", "--"]
working_directory: /opt/circleci/workdir/%s
cleanup_working_directory: true
idle_timeout: 1m
max_run_time: 5h
mode: single-task
EOF
And take note of the shutdown command in the associated services ExecStopPost
setting.
# aws_config/install_runner_ubuntu.sh
#------------------------------------------------------------------------------
# Create the service
# The service will shut down the instance when it exits - that is, the runner has completed with a success or error
#------------------------------------------------------------------------------
cat << EOF >$SERVICE_PATH
[Unit]
Description=CircleCI Runner
After=network.target
[Service]
ExecStart=$prefix/circleci-launch-agent --config $CONFIG_PATH
ExecStopPost=shutdown now -h
Restart=no
User=root
NotifyAccess=exec
TimeoutStopSec=18300
[Install]
WantedBy = multi-user.target
EOF
This ensures that any idle runners that have not claimed a job and any runners that have completed their tasks will be terminated quickly to avoid wasting resources.
When configuring your service, refer to the systemd documentation if you need to make changes. The previous example will terminate the service if it runs for longer than 5 hours (18300 seconds), matching the max_run_time
of the runner.
Creating the launch template
Log into the AWS Management Console and navigate to the services page for managing EC2. You will need to create a launch template and fill out the fields like this:
- Name your new template something sensible like cci-runner-template.
- Select the checkbox that says “Provide guidance to help me set up a template that I can use with EC2 Auto Scaling.”
- For the Launch template contents AMI, select Quick Start then Ubuntu 22.04 LTS.
- Select an Instance type — you will need to pick one based on your requirements.
- Select a
Key pair
for logging in. This is helpful when you need to log in via SSH to troubleshoot an instance. - For Network settings and Security groups, select an existing security group or create one. It’s wise to allow SSH only from a trusted IP address, and to block all other incoming traffic.
- The self-hosted runner polls CircleCI for new jobs, and does not require any incoming connections.
- For the Advanced network configuration click Add network interface and enable Auto-assign public IP for that interface.
- Configure storage — increase the size of the hard disk for each instance if you think you’ll need it.
- For Advanced details, copy and paste the contents of
install_runner_ubuntu.sh
in its entirety into the User data field. The contents of this field will be executed as a shell script when an instance launches. - Everything else can be left at the default values.
Be aware that the resource class authentication token is stored in the launch template (as part of the runner install script), so don’t share it!
Creating the Auto Scaling group
Next, you will need to create an Auto Scaling group. Enter these values for each section:
- Step 1: Choose launch template or configuration.
- Name your group something sensible like
cci-runner-auto-scaling-group
. - Ensure the template you create is set as the
Launch template
. - Leave everything else as it is.
- Name your group something sensible like
- Step 2: Choose instance launch options.
- For Instance launch options, select an availability zone and subnet. If your instances will need to communicate with other AWS assets, assign them to the appropriate zone/subnet.
- Leave everything else as it is.
- Step 3: Configure advanced options.
- Leave everything as it is.
- Step 4: Configure group size and scaling policies.
- Set Desired capacity, Minimum capacity, and Maximum capacity to 0. The Lambda function created in a subsequent step will update these values to match your scaling requirements.
- Select the Enable scale-in protection checkbox to will protect instances from being terminated prematurely. Jobs may be completed out of the order they were submitted, reducing the queue depth and causing the Auto Scaling group to terminate older instances — even though they may not be the self-hosted runners that have finished their tasks.
- Leave everything else as it is.
- Skip steps 5 and 6.
- Step 7: Review.
- Review your configuration and save it.
Once launched, instances will be in charge of their own lifecycle. The self-hosted runner will terminate after a short idle time based on the idle_timeout flag
. Because the runner is in single-task mode, the self-hosted runner will also terminate gracefully upon completion of a job (success or failure). We’ve also configured the service to shut down the instance when it exits.
Creating the IAM policy and role
The Lambda function needs permissions to monitor the queue and alter the auto-scaling parameters. You will need to set up an Identity and Access Management (IAM) policy and associated role to grant these permissions.
Create a policy with the required permissions. You can copy and paste the policy from the file aws_config/lambda_iam.json in our example repository or from the code block below. This policy will give permissions to update Auto Scaling groups and read secrets from the AWS secrets manager — the two permissions required by the Lambda function.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "autoscaling:UpdateAutoScalingGroup",
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "secretsmanager:GetSecretValue",
"Resource": "*"
}
]
}
Once the policy has been set up, create a new role for your Lambda function and assign the new policy to the role.
Again, keep naming consistent so that you can easily find and identify the AWS components later on. Give the IAM components sensible names like: cci-runner-lambda-iam-policy
and cci-runner-lambda-iam-role
.
Creating (and keeping) secrets
The AWS secrets manager provides a secure way to store API keys and other sensitive information for use in Lambda functions. Secrets are stored as key/value pairs. Create a secret with these options:
- Step 1: Choose secret type.
- Select Other type of secret.
- Add the following key/value pairs:
resource_class_
: the resource class for the runner in CCI in the format username/class-name.circle_token_:
This will be a CircleCI personal token for polling the runner API — it is not the runner token used in the installation script above.
- Leave the Encryption key set to
aws/secretsmanager
.
- Step 2: Configure your secret.
- Name your secret something sensible like cci-runner-lambda-secrets.
- Leave the rest as it is.
- Step 3: Configure rotation — optional.
- Leave as-is.
- Step 4: Review.
- Review and save your secrets.
- There’s no need to copy and paste the generated code at this point — it’s already included in the example Lambda function in the repository. Make sure that you take note of the secret name and region.
Creating the Lambda function
AWS Lambda functions are serverless functions that execute code. This example uses a Lambda function triggered on a schedule to run a Python script that checks the CircleCI runner API and alters the Auto Scaling group to raise or lower the number of running instances. Create a Lambda function from scratch with the following configuration:
- Name your function something sensible like
cci-runner-lambda-function
. - Set the Runtime to
Python 3.8
. - Set the Architecture to
x64_64
. - Click Execution role, then Use an existing role. Select the IAM role you created earlier.
Copy and paste the contents of the aws_config/lambda_function.py file into the function source in the AWS console.
# aws_config/lambda_function.py
import json, urllib3, boto3, base64, os
# This script polls the number of unclaimed tasks for a Circle CI runner class, and sets the parameters for an AWS Auto Scaling group
# It uses the CircleCI runner API https://circleci.com/docs/2.0/runner-api/
# It requires the included IAM role and should be triggered every minute using an EventBridge Cron event
# Retrieve environment variables
secret_name = os.environ['SECRET_NAME']
secret_region = os.environ['SECRET_REGION']
auto_scaling_max = os.environ['AUTO_SCALING_MAX']
auto_scaling_group_name = os.environ['AUTO_SCALING_GROUP_NAME']
auto_scaling_group_region = os.environ['AUTO_SCALING_GROUP_REGION']
# Function to retrieve secrets from AWS Secrets manager
# https://aws.amazon.com/secrets-manager/
def get_secret(secret_name, region_name):
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name
)
get_secret_value_response = client.get_secret_value(
SecretId=secret_name
)
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
return(secret)
else:
decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
return(decoded_binary_secret)
# Make a HTTP GET request with the given URL and headers
def get_request(url, headers):
http = urllib3.PoolManager()
r = http.request('GET', url, headers=headers)
r_json = json.loads(r.data.decode("utf-8"))
return(r_json)
# The handler function - is executed every time the Lambda function is triggered
def lambda_handler(event, context):
# Get secrets
secrets = json.loads(get_secret(secret_name, secret_region))
# Configure Runner API endpoint https://circleci.com/docs/2.0/runner-api/#endpoints
endpoint_url = 'https://runner.circleci.com/api/v2/tasks?resource-class=' + secrets['resource_class']
headers = {'Circle-Token': secrets['circle_token']}
# Get result from API endpoint
result = get_request(endpoint_url, headers)
# Configure Runner API endpoint https://circleci.com/docs/2.0/runner-api/#endpoints
endpoint_url = 'https://runner.circleci.com/api/v2/tasks/running?resource-class=' + secrets['resource_class']
headers = {'Circle-Token': secrets['circle_token']}
# Get result from API endpoint
result_running = get_request(endpoint_url, headers)
total_desired = int(result["unclaimed_task_count"]) + int(result_running["running_runner_tasks"])
# Update the auto scaling group with a desired number of instances set to the number of jobs in the queue, or the maximum, whichever is smallest
instances_min = 0
instances_max = int(auto_scaling_max)
instances_desired = min(total_desired, int(auto_scaling_max))
# Set the Auto Scaling group configuration
client = boto3.client('autoscaling', region_name=auto_scaling_group_region)
client.update_auto_scaling_group(
AutoScalingGroupName=auto_scaling_group_name,
MinSize=instances_min,
MaxSize=instances_max,
DesiredCapacity=instances_desired
)
# Lambda functions should return a result, even if it isn't used
return result["unclaimed_task_count"]
Under the Configuration tab for your Lambda function, navigate to Environment variables, then edit and add the following key/value pairs:
SECRET_NAME
: The name of the secret created above.SECRET_REGION
: The region of the secret above.AUTO_SCALING_MAX
: The maximum number of instances to spin up, as an integer.- We recommend setting the maximum to the self-hosted runner concurrency limit of your CircleCI plan.
AUTO_SCALING_GROUP_NAME
: The name of the Auto Scaling group.AUTO_SCALING_GROUP_REGION
: The region of the Auto Scaling group.
Leave everything else at the default values.
Triggering the Lambda function on a schedule
Lambda functions can be triggered in a number of ways. In this case, it will be executed on a schedule. We recommend calling the function every minute to check the queue depth and make the appropriate adjustments in a timely manner.
To set this up, go to the Lambda function editing screen and click Add trigger. Search for and select EventBridge (CloudWatch Events), then select Create a new rule. Fill out the following details:
- Name your rule something sensible like
cci-runner-scheduled-trigger
. - Set Rule type to Schedule expression.
- Enter the value
cron(0/1 * * * ? *)
to trigger the function every minute.
Click Add to finish setting up the scheduled trigger.
Testing and deploying
With that, all of the moving parts for your auto-scaled runner solution are in place. Now you can add it to your CircleCI configuration and start using it!
In the Lambda function editing screen, return to the Code tab and click Deploy. That’s it! Everything is now running and ready to use.
To test, go to the Test tab. Leave everything as-is (to prevent the test being saved). Click Test.
The result will be success or failure, and you will be able to debug your function code if necessary. If everything comes back green, you’re up and running.
If you wish to monitor your function, you can use the Monitor tab in Lambda to make sure your function is running to the schedule you set in the previous section.
Running CircleCI jobs on the auto-scaled self-hosted runners
To run CircleCI jobs on your new auto-scaling resource class, you first need to add the resource class to your CircleCI configuration file.
The .circleci/config.yml file in the sample repository uses the machine
and resource_class
options.
# .circleci/config.yml
version: 2.1
workflows:
testing:
jobs:
- runner-test
jobs:
runner-test:
machine: true
resource_class: NAMESPACE/-NAMENAME # Update this to reflect your self-hosted runner resource class details
steps:
- run: echo "Hi I'm on Runners!"
- run: node --version
- checkout
- run:
command: npm install
name: Install Node.js app dependencies
- run:
command: npm run test
name: Test app
Once you’ve run a job, the self-hosted runners will appear in the CircleCI web interface when they are launched. When a job appears in the queue, the AWS Lambda function will trigger the Auto Scaling group to increase its capacity. When the instance is ready, the job will be sent from CircleCI to the runner to execute.
You can monitor the status of self-hosted runners through the CircleCI web interface. When it’s done, the runner will terminate the instance it is running on, and the Auto Scaling group will reduce the desired number of instances to match the new queue length.
The results of your pipeline will be sent back to CircleCI UI with your CircleCI Cloud jobs.
On the AWS side, you will be able to see the Auto Scaling group being adjusted by the Lambda function as the queue depth changes and jobs complete.
CircleCI is a flexible CI/CD platform that works the way you work
With self-hosted runners, you have full control of your CI/CD pipeline, including the execution environment and where your data is stored and processed.
CircleCI encourages DevOps best practices — but it doesn’t dictate how you should do things. You need to be able to work to your team’s strengths, with the full flexibility allowed by your toolchain, while remaining compliant and secure. You can start out using pre-built execution environments, and as your requirements become more specialized, deploy your own customized, scalable self-hosted runners and start using them with a simple configuration change — without having to overhaul your entire CI/CD toolchain.
You can get started with CircleCI today by signing up for a free plan. Your free plan provides everything you need to start building your own automated CI/CD pipelines to test and deploy your code. Self-hosted runners are included in all plans, and are available for you to use when you’re ready to start experimenting with the advanced features CircleCI has to offer.
You can learn more about using self-hosted runners in your CircleCI pipelines with the following resources:
- Install self-hosted runners in 5 minutes or less
- Benefits of running continuous integration jobs on self-hosted infrastructure
- Run private cloud and on-premises jobs with CircleCI runner
- Testing locally with CircleCI runners
- Managing code signing on CircleCI using the runner
- Self-hosted runner course on CircleCI Academy