Scaling self-hosted runners

4 months ago · 3 min read

Cloud Server v4+

Introduction

Maintaining a fixed compute fleet of self-hosted runners can incur unnecessary costs as the workload can fluctuate depending on the rate at which jobs are queued. To help reduce this cost, the compute fleet can be scaled according to the demand.

You can view an example tutorial for scaling machine runners with AWS AutoScaling groups on the CircleCI blog.

Container runner

If you’re using the container runner on Kubernetes, it will automatically spin up more pods as more work enters the queue. Those pods are ephemeral and will be torn down after the job is done executing. While pods will scale with the work automatically, CircleCI will not go scale the underlying compute for your Kubernetes cluster.

Scaling data

Several API endpoints are available to help you set up a solution to scale machine runners or your Kubernetes cluster for container runners:

A scaling solution can use the above endpoints to calculate the total number of waiting tasks which can be run. The task data endpoints are scoped to a single resource class, so it’s important to query every available resource class to get the total number of running tasks.

If you’re using machine runners, you can devise a scaling solution to add more machine runners to the resource class that has more pending work.

You will also need to be aware of your plan’s concurrency limit to avoid starting compute which cannot be used. This can be found on the CircleCI Plans page.

Agent configuration for machine runner

The following machine runner configuration settings can be used by your scaling solution to assist resource cleanup when the demand drops:

Runner Mode
- Choosing single-task mode will cause machine runner to shut down after a single task. This is useful if using completely ephemeral compute as the resources can be automatically recycled upon machine runner exit.
- Choosing continuous mode will cause the machine runner to poll for new tasks after completing a task. Your scaling solution will need to monitor the task workload and actively shutdown unused machine runners.
Runner Idle Timeout
- Setting a reasonable timeout can be used for automatic resource recycling during periods of lower demand.

[path]

[title]

Scaling self-hosted runners

Introduction

Container runner

Scaling data

Agent configuration for machine runner