Container runner reference
On This Page
- Running your first job with container runner
- Container runner sample configuration
- Resource class configuration and custom task pod configuration
- Customizable service containers
- Example
- Image match types
- Order of precedence
- Selection scope
- Order of precedence
- Troubleshooting
- Unsafe retries
- Monitoring
- Custom token secret
- Helm chart parameters
- Kubernetes permissions
- Garbage collection
- Logging containers
- Constraint validation
- Cost and availability
- Building container images
- Using the Buildah image
- Using Buildah with custom images
- Limitations
- FAQs
This document is a comprehensive guide to operating and configuring jobs with the CircleCI container runner.
Running your first job with container runner
Follow the instructions outlined on the Container runner installation page to download the container runner and run your first job. You can also use the CircleCI web app to get started with self-hosted runners.
Container runner sample configuration
version: 2.1
jobs:
build:
docker:
- image: cimg/base:2021.11
resource_class: <namespace>/<resource-class>
steps:
- checkout
- ...
workflows:
build-workflow:
jobs:
- build
Resource class configuration and custom task pod configuration
Container runner supports claiming and running tasks from multiple resource classes concurrently, as well as customization of the Kubernetes resources created to run tasks for a particular resource class. Configuration is provided by a map object in the Helm chart values.yaml
.
Each resource class supports the following parameters:
-
token
: The runner resource class token used to claim tasks (required). -
Custom Kubernetes pod configuration for pods used to run CircleCI jobs.
The pod configuration takes all fields that a normal Kubernetes pod does. If service containers are used in a CircleCI job, the first container
spec is used for all containers within the task pod. Customizable service containers can be used to provide different container configuration between service containers and the main task container.
The following fields will be overwritten by container runner to ensure correct task function, and expected CircleCI configuration behavior:
-
spec.containers[0].name
-
spec.containers[0].container.image
-
spec.containers[0].container.args
-
spec.containers[0].container.command
-
spec.containers[0].container.workingDir
-
spec.restartPolicy
-
metadata.name
-
metadata.namespace
Below is a full configuration example, containing two resource classes:
agent:
resourceClasses:
circleci-runner/resourceClass:
token: TOKEN1
metadata:
annotations:
custom.io: my-annotation
spec:
containers:
- resources:
limits:
cpu: 500m
volumeMounts:
- name: xyz
mountPath: /path/to/mount
securityContext:
runAsNonRoot: true
imagePullSecrets:
- name: my_cred
volumes:
- name: xyz
emptyDir: {}
circleci-runner/resourceClass2:
token: TOKEN2
spec:
imagePullSecrets:
- name: "other"
Customizable service containers
By default, service (or secondary) containers inherit the same container configuration as defined by the primary container. However, this behavior can be overridden using customizable service containers. Using the available overrides allows fine-tuned control over a service’s resource usage on a per-image basis.
Example
Consider the following container runner Helm values:
agent:
serviceContainers:
exact:
"cimg/redis:6":
resources:
requests:
cpu: "0.5"
memory: "200Mi"
resourceClasses:
your-namespace/your-resource-class:
serviceContainers:
exact:
"cimg/postgres:16":
resources:
requests:
cpu: "1"
memory: "500Mi"
prefix:
"cimg/postgres":
resources:
requests:
cpu: "0.7"
memory: "250Mi"
pattern:
"cimg/mysql:.*":
resources:
requests:
cpu: "0.6"
memory: "300Mi"
default:
resources:
requests:
cpu: "0.4"
memory: "150Mi"
And the following CircleCI config.yml
snippet:
jobs:
build:
resource_class: your-namespace/your-resource-class
docker:
- image: cimg/base:current
- image: cimg/redis:6
- image: cimg/postgres:16
- image: cimg/mysql:8
- image: cimg/mongo:5
In this configuration:
-
cimg/redis:6
matches theexact
rule at the global scope (withinagent.serviceContainers
) and is allocated 0.5 CPU units and 200Mi of memory. -
cimg/postgres:16
matches theexact
rule at the resource class scope (your-namespace/your-resource-class
) and is allocated 1 CPU unit and 500Mi of memory. -
cimg/mysql:8
matches thepattern
rule at the resource class scope and is allocated 0.6 CPU units and 300Mi of memory. -
cimg/mongo:5
doesn’t match any rule from the service container options, hence defaults to thedefault
rule at the resource class scope and is allocated 0.4 CPU units and 150Mi of memory.
The rendered Pod specification would then appear as follows:
spec:
containers:
- name: cimg/redis:6
resources:
requests:
cpu: "0.5"
memory: "200Mi"
- name: cimg/postgres:16
resources:
requests:
cpu: "1"
memory: "500Mi"
- name: cimg/mysql:8
resources:
requests:
cpu: "0.6"
memory: "300Mi"
- name: cimg/mongo:5
resources:
requests:
cpu: "0.4"
memory: "150Mi"
In the following sections, we will discuss these customization options in greater detail.
Image match types
Image match types govern how images are matched for container customization. The types include:
-
Exact: For exact matching, the image string must be an exact match. For example,
cimg/redis:6.2.6
only matches thecimg/redis:6.2.6
image. -
Prefix: For prefix matching, the image string matches all images with a common prefix. For example,
cimg/redis:
will match anycimg/redis
image regardless of the tag. -
Pattern: For pattern matching, a Go-based regex pattern is used to match images. For example,
cimg/(redis|postgres):.*
matches anyredis
orpostgres
image from thecimg
repository regardless of the tag. Refer to the Golang regex syntax and regex101.com to test your regular expressions. -
Default: The
Default
match type applies when an image did not match any of the other image match types. It sets a single specification for all such service containers.
Order of precedence
Selectors follow the hierarchy: Exact
→ Prefix
→ Pattern
→ Default
. If a given image name does not match any rule in the hierarchy, it defaults to the Default
rule.
Match types defined at the resource class scope take precedence over those at the same match type. |
Selection scope
Selection scopes determine the context in which the customization is applied. This comprises:
-
Resource class: This scope specifies a custom configuration for all containers running within a particular resource class. For example, setting specific resources under
your-namespace/your-resource-class
impacts only the containers running within this specific class. This scope takes precedence over the Global scope.resourceClasses: your-namespace/your-resource-class: serviceContainers: exact: "cimg/postgres:16": resources: requests: cpu: "1" memory: "500Mi"
-
Global: This scope applies a custom configuration globally to all containers across all resource classes. It is considered when no matching scope is found at the resource class level.
agent: serviceContainers: exact: "cimg/redis:6": resources: requests: cpu: "0.5" memory: "200Mi"
Order of precedence
The Resource class
scope overrides any Global
scope selection for a given match type. If a match is available in both scopes, the Resource class
scope prevails.
Troubleshooting
Container runner sets Kubernetes annotations on the pod corresponding to each service container. This annotation includes metadata about the selection scope and image match type for the container specification.
These values take the following form: app.circleci.com/container-spec-secondary-<ordinal-number>: {"selectionScope":"<global|resource-class>","imageMatchType":"<exact|prefix|pattern|default>"}
.
For instance, consider again the configurations from the example above. These would lead to the following annotations being added to the pod, which you can also find on the pod description in the job’s Task lifecycle step:
Annotations:
app.circleci.com/container-spec-secondary-1: {"selectionScope":"global","imageMatchType":"exact"} <- Corresponds to "cimg/redis:6"
app.circleci.com/container-spec-secondary-2: {"selectionScope":"resource-class","imageMatchType":"exact"} <- Corresponds to "cimg/postgres:16"
app.circleci.com/container-spec-secondary-3: {"selectionScope":"resource-class","imageMatchType":"pattern"} <- Corresponds to "cimg/mysql:8"
app.circleci.com/container-spec-secondary-4: {"selectionScope":"resource-class","imageMatchType":"default"} <- Corresponds to "cimg/mongo:5"
Unsafe retries
Unsafe retries enable container runner to automatically rerun tasks that are unexpectedly interrupted during their execution. These disruptions could be due to network connectivity issues, the underlying node shutting down, or other unpredictable causes. Any job failure that would be displayed in the CircleCI web app as an infrastructure fail should be expected to trigger an unsafe retry when enabled.
Unsafe retries is useful when scheduling workloads on spot instances, which often come with cost-saving benefits at the risk of pod preemptions with many Kubernetes providers.
This feature is called “unsafe retries” for a reason. Unlike automatic retries on startup, retrying tasks during runtime can be risky. This is because tasks can have arbitrary steps that produce external side effects which are not idempotent or stateless. This includes steps that could impact production environments or databases. Use this feature with care, knowing the risks of rerunning jobs and workflows that may or may not be idempotent. |
The following sequence shows how unsafe retires work:
-
If a pod fails or gets evicted during runtime, container runner will release the task.
-
All resources managed by container runner for the task, such as the Kubernetes pod and secret, are cleaned up and deleted.
-
The released task then becomes available for reclaim by any container runner instance configured for the same resource class.
-
Once reclaimed, the task is restarted completely from scratch, including previously run steps.
-
A task can be retried up to 3 times before it is deemed to have permanently failed.
To enable unsafe retries, set the enableUnsafeRetries
flag in the resource class configuration for each resource class. The following example shows two resource class definitions. Unsafe retries is enabled for the first, for spot instances, but not for the second resource class:
agent:
resourceClasses:
your-namespace/your-resource-class-1:
enableUnsafeRetries: true
token: your-resource-class-1-token
# The following spec isn't required, but serves as an example of how you could schedule tasks on spot instances using tolerations for the node's taint
spec:
tolerations:
- key: "lifecycle"
operator: "Equal"
value: "Ec2Spot"
effect: "NoExecute"
your-namespace/your-resource-class-2:
# Unsafe retries are disabled by default
token: your-resource-class-2-token
# This resource class can only schedule tasks on nodes without taints specific to spot instances
Monitoring
Container runner logs an event whenever a task encounters a runtime failure. The specific error message is provided under the error
field within the service-work
span. To check whether the task is set to be rerun or not (either because it cannot be retried or all retries have been exhausted), you can inspect the app.to_retry
field. This boolean indicates the retry status of the task.
You can utilize these fields with your preferred Kubernetes logging integrations to monitor when and how frequently tasks are retried.
Custom token secret
Using the configuration described above provisions a Kubernetes secret containing your resource class tokens. In some circumstances, you may wish to provision your own secret, or you simply might not want to specify the tokens via Helm. Instead, you can provision your own Kubernetes secret containing your tokens and specify its name in the agent.customSecret
field.
The secret should contain a field for each resource class, using the resource class name as the key and the token as the value. Consider the following resourceClasses
configuration:
agent:
resourceClasses:
circleci-runner/resourceClass:
metadata:
annotations:
custom.io: <my-annotation>
circleci-runner/resourceClass2:
customSecret: <name_of_secret>
The corresponding custom secret would have 2 fields:
circleci-runner.resourceClass: <my-token>
circleci-runner.resourceClass2: <my-token-2>
Due to Kubernetes secret key character constraints, the /
separating the namespace and resource class name is replaced with a .
character. Other than this, the name must exactly match the resourceClasses
config to match the token with the correct configuration.
Even if there is no further pod configuration, the resource class must be present in resourceClasses
as an empty map, as shown by circleci-runner/resourceClass2
in the above config example.
Additional instructions can be found in our Support Center.
Helm chart parameters
The container runner Helm chart is hosted here. You can find a full chart values reference section in the readme.
Kubernetes permissions
Container runner needs the following Kubernetes permissions:
-
Pods, Pods/Exec
-
Get
-
Watch
-
List
-
Create
-
Delete
-
-
Secrets
-
Get
-
List
-
Create
-
Delete
-
-
Events
-
List
-
Watch
-
-
Nodes
-
Get
-
List
-
If Rerun job with SSH is enabled, the following permissions are also required:
-
Gateways, Services
-
Get
-
In addition, Logging containers require the following minimal permissions to get service container logs and stream them to the CircleCI web app:
-
Pods, Pods/Logs
-
Watch
-
By default a Role
, RoleBinding
and service account are created and attached to the container runner pod, but if you customize these, the above are the minimum required permissions.
It is assumed that the container runner is running in a Kubernetes namespace without any other workloads. It is possible that the agent or garbage collection (GC) could delete pods in the same namespace.
Cluster-wide permissions are used by container runner to autodetect the OS and CPU architecture of the node that the task pod is running on. If you do not want to grant these permissions to container runner, you can set agent.autodetectPlatform to false , which will assume the node OS and architecture matches the node that the container runner pod is on. |
Garbage collection
Each container runner has a garbage collector. The garbage collector ensures the removal of any pods and secrets with the label app.kubernetes.io/managed-by=circleci-container-agent
that are left dangling in the cluster. By default, the garbage collector removes all jobs older than five hours and five minutes. This time limit can be shortened or lengthened via the agent.gc.threshold
parameter. However, if you do shorten the garbage collection frequency, you must also shorten the maximum task run time via the agent.maxRunTime
parameter to be a value smaller than the new garbage collection frequency.
If you change the garbage collection threshold but do not keep the max task run time lower than the garbage collection frequency, a running task pod could be removed by the garbage collector. |
The garbage collector may remove some objects sooner than the threshold. Task pods have a liveness probe that checks for a running task-agent process. Once a task completes or fails, the task-agent process will stop running and the liveness probe will fail, which will trigger GC.
Container runner will drain and restart cleanly when sent a termination signal. Container runner will not automatically attempt to launch a task that fails to start. This can be done in the CircleCI web app.
If the container runner crashes, there is no expectation that in-process or queued tasks are handled gracefully.
Logging containers
Container runner schedules a logging container if there are secondary (service) containers in the task pod. This container will get the secondary container logs and stream them to the steps UI in the CircleCI web app. Task agent, which runs in the primary container, is responsible for streaming all other step output to the CircleCI web app. The only exception is the Task lifecycle
step, which is streamed by container runner itself.
Logging containers require a service account token with the minimal privileges to get container logs.
Container runner currently sets default resource limits and requests on the logging container, they are:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
Constraint validation
Container runner allows you to configure task pods with the full range of Kubernetes settings. This means pods can potentially be configured in a way which cannot be scheduled due to their constraints. To help with this, container runner has a constraint checker which periodically validates each resource class configuration against the current state of the cluster, to ensure pods can be scheduled. This prevents container runner claiming jobs which it cannot schedule which would then fail.
If the constraint checker fails too many checks, it will disable claiming for that resource class until the checks start to pass again.
Currently the following constraints are checked against the cluster state:
-
Node Affinity - Only MatchExpressions are checked
As an example of how this works, consider the following resource class configuration:
agent:
resourceClasses:
circleci-runner/resourceClass:
token: TOKEN1
spec:
nodeSelector:
disktype: ssd
circleci-runner/resourceClass2:
token: TOKEN2
The first resource class has a node selector to ensure it is scheduled to nodes with an SSD. For some reason during operations the cluster no longer has any nodes with that label. The constraint checker will now fail checks for circleci-runner/resourceClass
and will disable claiming jobs until it finds nodes with the correct label again. circleci-runner/resourceClass2
claiming is not affected, the checks for different resource classes are independent of each other.
Cost and availability
Container runner jobs are eligible for Runner Network Egress. This is in line with the existing pricing model for self-hosted runners, and will happen with close adherence to the rest of CircleCI’s network and storage billing roll-out. If there are questions, reach out to your point of contact at CircleCI.
The same plan-based offerings for self-hosted runner concurrency limits apply to the container runner. Final pricing and plan availability will be announced closer to the general availability of the offering.
Building container images
Docker in Docker is not recommended due to the security risk it can pose to your cluster.
To build container images in a container-agent job, a user may use:
-
A third-party tool like Buildah or kaniko
-
Machine runner installed with Docker installed on it
-
CircleCI-hosted compute
Note: Third-party tools should be used at your own discretion.
While jobs that run with container-agent cannot use CircleCI’s setup_remote_docker feature, it is possible to use a third-party tool to build Docker images in your container-agent job without using the Docker daemon.
You can see an example on our community forum of how some users have successfully used kaniko to build a container image.
Another option is to use a tool called Buildah. Buildah can be used in your .circleci/config.yml
syntax:
docker:
- image: quay.io/buildah/stable:v1.27.0
Using the Buildah image
Buildah relies on the fuse-overlay program inside of the container, which means that a fuse device plugin must be configured in order to use it. /dev/fuse
is required to use fuse-overlayfs
inside of the container, as this option tells Buildah on the host to add /dev/fuse
to the container for Buildah’s use. Kubernetes has a device plugin system to enable secure sharing of host devices with pods.
To install the configuration dev/fuse
, clone this repository to where you are running Helm commands for your container-agent deployment. Then run:
kubectl create -f fuse-device-plugin-k8s-1.16.yml
You can confirm that this has been configured correctly by running kubectl get daemonset -n kube-system
and confirming that fuse-device-plugin-daemonset
is present and ready.
Once this device has been added, update the container-agent resource class configuration:
resourceClasses:
<namespace>/<resourceClass>:
token: <token>
spec:
containers:
- resources:
limits:
github.com/fuse: 1
This will now let you run Buildah commands with container agent jobs and build containers:
docker-image:
docker:
- image: quay.io/buildah/stable
resource_class: <namespace>/<resourceClass>
steps:
- checkout
- run:
name: sanity-test
command: |
buildah version
- run:
name: Building-a-container
command: |
buildah bud -f ./Dockerfile -t myimage:0.1
buildah push myimage:tag
Using Buildah with custom images
You can also build your own custom image and include the installation of Buildah in your Dockerfile:
sudo yum install buildah
If you plan to use a CircleCI convenience image, ensure you add the repository for installation to your job’s steps
:
sudo apt-get update
sudo apt-get install -y wget ca-certificates gnupg2
VERSION_ID=$(lsb_release -r | cut -f2)
echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/devel-kubic-libcontainers-stable.list
curl -Ls https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/xUbuntu_$VERSION_ID/Release.key | sudo apt-key add -
sudo apt-get update
sudo apt install buildah -y
Additionally, set the isolation variable to default to chroot
:
# Default to isolate the filesystem with chroot.
ENV BUILDAH_ISOLATION=chroot
You can then follow the same instructions as Using the Buildah image above to add the fuse device plugin to the container-agent deployment and update your .circleci/config.yml
file to use your custom images and build container images in those jobs.
Limitations
-
Any known limitation for the existing self-hosted runner will continue to be a limitation of container agent.
-
Only Kubernetes container environments are supported at this time.
-
setup_remote_docker
as a command is not supported with container runner. See Building Container Images. -
aws_auth.oidc_role_arn
is not supported on the container runner. You can set up AWS authentication using theaws_auth
field. More information can be found in the Configuration Reference.
FAQs
Visit the runner FAQ page to see commonly asked questions about container runner.