Start Building for Free
CircleCI.comAcademyBlogCommunitySupport

Troubleshoot self-hosted runner

5 days ago3 min read
Cloud
On This Page

This page describes errors and troubleshooting mehtods for container and machine runners.

Troubleshooting container runner

The following are errors you could encounter using container runner.

Container fails to start due to disk space

The task remains in the Preparing Environment step while the pod has a warning attached, noting that volume mounting fails due to a lack of disk space.

Events:
  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    67s   default-scheduler  Successfully assigned default/ccita-62e94fd3faccc34751f72803-0-7hrpk8xv to node3
  Warning  FailedMount  68s   kubelet            MountVolume.SetUp failed for volume "kube-api-access-52lfn" : write /var/snap/microk8s/common/var/lib/kubelet/pods/4cd5057f-df97-41c4-b5ef-b632ce74bf45/volumes/kubernetes.io~projected/kube-api-access-52lfn/..2022_08_02_16_24_55.1533247998/ca.crt: no space left on device

You should ensure there is sufficient disk space.

Pod host node runs out of memory

If the node a pod is hosted on runs out of memory, the task will fail with a failure step named Runner Instance Failure, and a message:

could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 137.

The pod will have a status of OOMKilled when viewed in Kubernetes with kubectl. You can use task pod configuration to control memory allocation for the job itself.

Pod host node is out of disk space

If the node is full it will have a node.kuberenetes.io/disk-pressure taint, which will prevent new task pods from being scheduled. If all valid nodes for the pod have the same taint, or other conditions that prevent scheduling, the task pod will sit in a pending state until an untainted valid node becomes available. This will show the job as stuck in the Preparing Environment step in the UI.

You need to scale your cluster more effectively to avoid this state.

The node a task is running on abruptly dies

When container runner is hosted on a separate node, the task will still look like it is running in the CircleCI UI until there is a timeout for it. Kubectl will also still show the pod as running until the cluster’s liveness probe timeout is hit. The pod will then enter a terminating state that it will become wedged in. At this point the pod will need to be forcefully removed. If force is not used it may cause kubectl to hang:

kubectl delete pod $POD_NAME --force

Image has a bad entrypoint

If the entrypoint specified for the image is invalid, the task will fail with an error:

could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 139.

There is a difference between how container runner and CircleCI cloud set the entrypoint of the primary container. On cloud, the entrypoint of the primary container is ignored unless it is preserved using the com.circleci.preserve-entrypoint=true LABEL instruction (see: Adding an entrypoint). In contrast, container runner will always default to a shell (/bin/sh), or the entrypoint specified in the job configuration, if set.

Note: Entrypoints should be commands that run forever without failing. If the entrypoint fails or terminates in the middle of a build, the build will also terminate. If you need to access logs or build status, consider using a background step instead of an entrypoint.

Specify an entrypoint using the Adding an entrypoint documentation to mitigate this error. You can set the entrypoint explicitly as described in Using custom built Docker images.

Image is for a different architecture

If an image for a job uses a different architecture than the node it is deployed on, container runner will give an error:

19:30:12 eb1a4 11412.984ms service-work error=1 error occurred:
        * could not start task containers: pod failed to start: :

The task pod will also show an error status. This will show as a failed job in the CircleCI UI with the error:

could not start task containers: pod failed to start: :

You should correct the underlying architecture used for nodes with jobs to match the architecture for images being used by jobs.

Bad task pod configuration

If the task pod for a resource class is misconfigured, the task will fail once claimed. In the UI the error will be in a Runner Instance Failure step with a message resembling:

could not start task containers: error creating task pod: Pod "ccita-62ea7dff36e977580a329a9d-0-uzz1y8xi" is invalid: [spec.containers[0].resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers[0].resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers, spec.containers[0].resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers[0].resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers]

No pod has been created in the Kuberenetes cluster. You will need to correct the task pod configuration as described on the Conatiner runner page.

Bash missing

"could not start task containers: exec into build container "container-0" failed: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "bb04485b9ef2386dee5e44a92bfe512ed786675611b6a518c3d94c1176f9a8aa": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown"

Bash is required for custom images used in jobs executed with a container runner.

Troubleshoot machine runner

The following are errors you could encounter using machine runner.

I installed my first self-hosted runner on macOS and the job is stuck in "Preparing Environment", but there are no errors, what should I do?

Try running the following two commands:

sudo chmod +x /var/opt/circleci/circleci-launch-agent
sudo /var/opt/circleci/circleci-launch-agent --config=/Library/Preferences/com.circleci.runner/launch-agent-config.yaml

Debugging with SSH

CircleCI’s machine runners support rerunning a job with SSH for debugging purposes. Instructions on using this feature can be found at Debugging with SSH.


Help make this document better

This guide, as well as the rest of our docs, are open source and available on GitHub. We welcome your contributions.

Need support?

Our support engineers are available to help with service issues, billing, or account related questions, and can help troubleshoot build configurations. Contact our support engineers by opening a ticket.

You can also visit our support site to find support articles, community forums, and training resources.