Troubleshoot self-hosted runner
On This Page
This page describes errors and troubleshooting methods for container and machine runners.
The following are errors you could encounter using container runner.
Container fails to start due to disk space
The task remains in the Preparing Environment step while the pod has a warning attached, noting that volume mounting fails due to a lack of disk space.
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 67s default-scheduler Successfully assigned default/ccita-62e94fd3faccc34751f72803-0-7hrpk8xv to node3 Warning FailedMount 68s kubelet MountVolume.SetUp failed for volume "kube-api-access-52lfn" : write /var/snap/microk8s/common/var/lib/kubelet/pods/4cd5057f-df97-41c4-b5ef-b632ce74bf45/volumes/kubernetes.io~projected/kube-api-access-52lfn/..2022_08_02_16_24_55.1533247998/ca.crt: no space left on device
You should ensure there is sufficient disk space.
Pod host node runs out of memory
If the node a pod is hosted on runs out of memory, the task will fail with a failure step named
Runner Instance Failure, and a message:
could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 137.
The pod will have a status of
OOMKilled when viewed in Kubernetes with kubectl. You can use task pod configuration to control memory allocation for the job itself.
Pod host node is out of disk space
If the node is full it will have a
node.kuberenetes.io/disk-pressure taint, which will prevent new task pods from being scheduled. If all valid nodes for the pod have the same taint, or other conditions that prevent scheduling, the task pod will sit in a pending state until an untainted valid node becomes available. This will show the job as stuck in the Preparing Environment step in the UI.
You need to scale your cluster more effectively to avoid this state.
The node a task is running on abruptly dies
When container runner is hosted on a separate node, the task will still look like it is running in the CircleCI UI until there is a timeout for it. Kubectl will also still show the pod as running until the cluster’s liveness probe timeout is hit. The pod will then enter a terminating state that it will become wedged in. At this point the pod will need to be forcefully removed. If force is not used it may cause kubectl to hang:
kubectl delete pod $POD_NAME --force
Image has a bad entrypoint
If the entrypoint specified for the image is invalid, the task will fail with an error:
could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 139.
There is a difference between how container runner and CircleCI cloud set the entrypoint of the primary container. On cloud, the entrypoint of the primary container is ignored unless it is preserved using the
com.circleci.preserve-entrypoint=true LABEL instruction (see: Adding an entrypoint). In contrast, container runner will always default to a shell (
/bin/sh), or the entrypoint specified in the job configuration, if set.
Note: Entrypoints should be commands that run forever without failing. If the entrypoint fails or terminates in the middle of a build, the build will also terminate. If you need to access logs or build status, consider using a background step instead of an entrypoint.
Image is for a different architecture
If an image for a job uses a different architecture than the node it is deployed on, container runner will give an error:
19:30:12 eb1a4 11412.984ms service-work error=1 error occurred: * could not start task containers: pod failed to start: :
The task pod will also show an error status. This will show as a failed job in the CircleCI UI with the error:
could not start task containers: pod failed to start: :
You should correct the underlying architecture used for nodes with jobs to match the architecture for images being used by jobs.
Bad task pod configuration
If the task pod for a resource class is misconfigured, the task will fail once claimed. In the UI the error will be in a
Runner Instance Failure step with a message resembling:
could not start task containers: error creating task pod: Pod "ccita-62ea7dff36e977580a329a9d-0-uzz1y8xi" is invalid: [spec.containers.resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers.resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers, spec.containers.resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers.resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers]
No pod has been created in the Kuberenetes cluster. You will need to correct the task pod configuration as described on the Conatiner runner page.
"could not start task containers: exec into build container "container-0" failed: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "bb04485b9ef2386dee5e44a92bfe512ed786675611b6a518c3d94c1176f9a8aa": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown"
Bash is required for custom images used in jobs executed with a container runner.
Oops, there was an issue with your infrastructure
If you see the message "Oops, there was an issue with your infrastructure. Verify your self-hosted runner infrastructure is operating and try re-running the job. If the issue persist, see our troubleshooting guide" on the job’s page, or if there is no content in the task lifecycle step (as shown), you should consider the potential causes described below:
Pod restart: Check if there were any container agent pod restarts around the time the workflow ran. If the pod was restarted around that time it would have resulted in the job not being processed. In such a case, we recommend rerunning the job once again.
You can check the logs for any of the previous runs using the command
kubectl logs -n circleci <full pod name> --previous.
Network connectivity issue: Check the network connectivity of the container agent, especially if the issue is intermittent. The issue can be seen when the container agent has lost network connectivity after claiming the tasks.
We suggest connecting to the pod using the command
kubectl exec --stdin --tty -n circleci < full pod name > — /bin/sh and then running a ping test for an extended period of time. We also recommend checking the connection to the links on our FAQ includes a section about the connectivity required for CircleCI’s self-hosted runners.
There are also external tools for monitoring resource usage on the Kubernetes documentation.
The following are errors you could encounter using machine runner.
I installed my first self-hosted runner on macOS and the job is stuck in "Preparing Environment", but there are no errors, what should I do?
In some cases, you may need to update the execution permission for the launch-agent so it is executable by root. Try running the following two commands:
sudo chmod +x /opt/circleci/circleci-launch-agent sudo /opt/circleci/circleci-launch-agent --config=/Library/Preferences/com.circleci.runner/launch-agent-config.yaml
Cancel the job and rerun it. If your job is still not running, file a support ticket.
Debugging with SSH
CircleCI’s machine runners support rerunning a job with SSH for debugging purposes. Instructions on using this feature can be found at Debugging with SSH.
Help make this document better
This guide, as well as the rest of our docs, are open source and available on GitHub. We welcome your contributions.
- Suggest an edit to this page (please read the contributing guide first).
- To report a problem in the documentation, or to submit feedback and comments, please open an issue on GitHub.
- CircleCI is always seeking ways to improve your experience with our platform. If you would like to share feedback, please join our research community.
Our support engineers are available to help with service issues, billing, or account related questions, and can help troubleshoot build configurations. Contact our support engineers by opening a ticket.
You can also visit our support site to find support articles, community forums, and training resources.
CircleCI Documentation by CircleCI is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.