Troubleshoot
On This Page
- Orbs
- Why do I receive an error message when trying to use an uncertified orb?
- Why do I get the following error when testing locally?
- I receive an error when attempting to claim a namespace or publish a production orb.
- Pipelines
- Config could not be located error
- Why is my scheduled pipeline not running?
- Why are my jobs not running when I push commits?
- Why is my job queued?
- Why are my jobs queuing even though I am on the Performance Plan?
- Why can I not find my project on the Projects dashboard?
- How do Docker image names work? Where do they come from?
- What is the best practice for specifying image versions?
- How can I set the timezone in Docker images?
- Container runner
- Container fails to start due to disk space
- Pod host node runs out of memory
- Pod host node is out of disk space
- The node a task is running on abruptly dies
- Image has a bad entrypoint
- Image is for a different architecture
- Bad task pod configuration
- Bash missing
- Oops, there was an issue with your infrastructure
- Task agent appears not running
- View the logs of a failed task pod
- Machine runner
- I installed my first self-hosted runner on macOS and the job is stuck in "Preparing Environment", but there are no errors, what should I do?
- Debugging with SSH
- OIDC tokens missing from jobs
- Releases
- Why is my Deployment/Rollout not showing up in the components tab or releases timeline view?
- Why is my release is stuck in the Running state?
- Why are no new releases are showing up? and/or why are component versions not being updated?
- Why is restore version using Helm is timing out?
- Why is the restore version button not available for a component version?
- Why are all buttons disabled for a release?
- Why are all commands for my component failing?
This page offers troubleshooting suggestions for the following aspects of CircleCI:
Orbs
Why do I receive an error message when trying to use an uncertified orb?
To enable usage of uncertified orbs, go to your organization’s settings page, and click the Security tab. Then, click yes to enable Allow Uncertified Orbs.
Uncertified orbs are not tested or verified by CircleCI. Currently, only orbs created by CircleCI are considered certified. Any other orbs, including partner orbs, are not certified. |
Why do I get the following error when testing locally?
Command:
circleci build -c .circleci/jobs.yml --job test
Error:
Error:
You attempted to run a local build with version 2.1 of configuration.
To resolve this error, run circleci config process
on your configuration and then save that configuration to disk. You then should run circleci local execute
against the processed configuration.
I receive an error when attempting to claim a namespace or publish a production orb.
You may not be an organization owner/admin.
Organizations can only claim a single namespace. In order to claim a namespace for an organization the authenticating user must have owner/admin privileges within the organization.
If you do not have the required permission level you might see an error similar to below:
Error: Unable to find organization YOUR_ORG_NAME of vcs-type GITHUB: Must have member permission.: the organization 'YOUR_ORG_NAME' under 'GITHUB' VCS-type does not exist. Did you misspell the organization or VCS?
Read more in the Orb CLI permissions matrix.
Pipelines
Config could not be located error
If you see the following error message, check the steps below to remediate the issue.
config file .circleci/sample-filename.yml could not be located on branch sample-branch-name in repository sample-repo-name
-
Ensure that there is a CircleCI configuration file in the repository on the branch that uses the filename specified in the error message. If there is not one present, add a CircleCI configuration file.
-
If there is a config file present:
-
Navigate to
in the CircleCI web app for the project where you are seeing this error message. -
Select the pencil icon for each pipeline listed and ensure that the "Config File Path" field matches the filepath of the config file that is in your repository. If you changed the name of the config file in your repository, the reference to that filepath must also be changed in the
section for any pipeline that uses that configuration file.
-
Why is my scheduled pipeline not running?
If your scheduled pipeline is not running, verify the following things:
-
Is the actor who is set for the scheduled pipelines still part of the organization? You can find this setting is under Attribution in the Triggers section of the web app.
-
Is the branch set for the schedule deleted?
-
Is your VCS organization using SAML protection? SAML tokens expire often, which can cause requests to fail.
Why are my jobs not running when I push commits?
In the CircleCI application, check the individual job and workflow views for error messages. More often than not, the error is because of formatting errors in your .circleci/config.yml
file.
See the YAML Introduction page for more details.
After checking your .circleci/config.yml
for formatting errors, search for your issue in the CircleCI support center.
Why is my job queued?
A job might end up being queued because of a concurrency limit being imposed due your organization’s plan. If your jobs are queuing often, you can consider upgrading your plan.
Why are my jobs queuing even though I am on the Performance Plan?
In order to keep the system stable for all CircleCI customers, we implement different soft concurrency limits on each of the Resource classes. If you are experiencing queuing on your jobs, it is possible you are hitting these limits. Contact CircleCI support to request raises on these limits.
Why can I not find my project on the Projects dashboard?
If you are not seeing a project you would like to build, and it is not currently building on CircleCI, check your org in the top left corner of the CircleCI application. For instance, if the top left shows your user my-user
, only projects belonging to my-user
will be available under Projects. If you want to build the project your-org/project
, you must switch your organization on the application’s organization switcher menu to your-org
.
How do Docker image names work? Where do they come from?
CircleCI currently supports pulling (and pushing with Docker Engine) Docker images from Docker Hub. For official images, you can pull by simply specifying the name of the image and a tag:
golang:1.7.1-jessie
redis:3.0.7-jessie
For public images on Docker Hub, you can pull the image by prefixing the account or team username:
my-user/couchdb:1.6.1
What is the best practice for specifying image versions?
It is best practice not to use the latest
tag for specifying image versions. It is also best practice to use a specific version and tag, for example cimg/ruby:3.0.4-browsers
, to pin down the image and prevent upstream changes to your containers when the underlying base distribution changes. For example, specifying only cimg/ruby:3.0.4
could result in unexpected changes from browsers
to node
. For more context, refer to Docker image best practices, and CircleCI image best practices.
How can I set the timezone in Docker images?
You can set the timezone in Docker images with the TZ
environment variable. A sample .circleci/config.yml
with a defined TZ
variable would look like the following:
version: 2.1
jobs:
build:
docker:
- image: your/primary-image:version-tag
auth:
username: mydockerhub-user
password: $DOCKERHUB_PASSWORD # context / project UI env-var reference
- image: mysql:5.7
auth:
username: mydockerhub-user
password: $DOCKERHUB_PASSWORD # context / project UI env-var reference
environment:
TZ: "America/Los_Angeles"
working_directory: ~/your-dir
environment:
TZ: "America/Los_Angeles"
In this example, the timezone is set for both the primary image and an additional mySQL image.
A full list of available timezone options is available on Wikipedia.
Container runner
The following are errors you could encounter using container runner.
Container fails to start due to disk space
The task remains in the Preparing Environment step while the pod has a warning attached, noting that volume mounting fails due to a lack of disk space.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned default/ccita-62e94fd3faccc34751f72803-0-7hrpk8xv to node3
Warning FailedMount 68s kubelet MountVolume.SetUp failed for volume "kube-api-access-52lfn" : write /var/snap/microk8s/common/var/lib/kubelet/pods/4cd5057f-df97-41c4-b5ef-b632ce74bf45/volumes/kubernetes.io~projected/kube-api-access-52lfn/..2022_08_02_16_24_55.1533247998/ca.crt: no space left on device
You should ensure there is sufficient disk space.
Pod host node runs out of memory
If the node a pod is hosted on runs out of memory, the task will fail with a failure step named Runner Instance Failure
, and a message:
could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 137.
The pod will have a status of OOMKilled
when viewed in Kubernetes with kubectl. You can use task pod configuration to control memory allocation for the job itself.
Pod host node is out of disk space
If the node is full it will have a node.kuberenetes.io/disk-pressure
taint, which will prevent new task pods from being scheduled. If all valid nodes for the pod have the same taint, or other conditions that prevent scheduling, the task pod will sit in a pending state until an untainted valid node becomes available. This will show the job as stuck in the Preparing Environment step in the UI.
You need to scale your cluster more effectively to avoid this state.
The node a task is running on abruptly dies
When container runner is hosted on a separate node, the task will still look like it is running in the CircleCI UI until there is a timeout for it. kubectl
will also still show the pod as running until the cluster’s liveness probe timeout is hit. The pod will then enter a terminating state that it will become wedged in. At this point the pod will need to be forcefully removed. If force is not used it may cause kubectl to hang:
kubectl delete pod $POD_NAME --force
Image has a bad entrypoint
If the entrypoint specified for the image is invalid, the task will fail with an error:
could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 139.
Container runner and CircleCI cloud set the entrypoint of the primary container in different ways. On cloud, the entrypoint of the primary container is ignored unless it is preserved using the com.circleci.preserve-entrypoint=true LABEL
instruction (see: Adding an entrypoint). In contrast, container runner will always default to a shell (/bin/sh
), or the entrypoint specified in the job configuration, if set.
Entrypoints should be commands that run forever without failing. If the entrypoint fails or terminates in the middle of a build, the build will also terminate. If you need to access logs or build status, consider using a background step instead of an entrypoint. |
Specify an entrypoint using the Adding an entrypoint documentation to mitigate this error. You can set the entrypoint explicitly as described in Using custom built Docker images.
Image is for a different architecture
If an image for a job uses a different architecture than the node it is deployed on, container runner will give an error:
19:30:12 eb1a4 11412.984ms service-work error=1 error occurred:
* could not start task containers: pod failed to start: :
The task pod will also show an error status. This will show as a failed job in the CircleCI UI with the error:
could not start task containers: pod failed to start: :
You should correct the underlying architecture used for nodes with jobs to match the architecture for images being used by jobs.
Bad task pod configuration
If the task pod for a resource class is misconfigured, the task will fail once claimed. In the UI the error will be in a Runner Instance Failure
step with a message resembling:
could not start task containers: error creating task pod: Pod "ccita-62ea7dff36e977580a329a9d-0-uzz1y8xi" is invalid: [spec.containers[0].resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers[0].resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers, spec.containers[0].resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers[0].resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers]
No pod has been created in the Kubernetes cluster. You will need to correct the task pod configuration as described on the Container runner page.
Bash missing
"could not start task containers: exec into build container "container-0" failed: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "bb04485b9ef2386dee5e44a92bfe512ed786675611b6a518c3d94c1176f9a8aa": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown"
Bash is required for custom images used in jobs executed with a container runner.
Oops, there was an issue with your infrastructure
If you see the message "Oops, there was an issue with your infrastructure. Verify your self-hosted runner infrastructure is operating and try re-running the job. If the issue persist, see our troubleshooting guide" on the job’s page, or if there is no content in the task lifecycle step (as shown), you should consider the potential causes described below:
-
Pod restart: Check if there were any container agent pod restarts around the time the workflow ran. If the pod was restarted around that time it would have resulted in the job not being processed. In such a case, we recommend rerunning the job once again.
You can check the logs for any of the previous runs using the command kubectl logs -n <namespace> <full pod name> --previous
`.
-
Network connectivity issue: Check the network connectivity of the container agent, especially if the issue is intermittent. The issue can be seen when the container agent has lost network connectivity after claiming the tasks.
We suggest connecting to the pod using the command kubectl exec --stdin --tty -n circleci < full pod name > — /bin/sh
and then running a ping test for an extended period of time. We also recommend checking the connection to the links on our FAQ includes a section about the connectivity required for CircleCI’s self-hosted runners.
-
Resources exhaustion: Check if your pods are reaching their resources limits in the cluster, as the pod could end the job to free up resources. We recommend setting resource limits either within your
values.yaml
or within yourconfig.yaml
.
Refer to the Kubernetes documentation for details of external resource monitoring tools.
Task agent appears not running
The task pod may fail with an Task agent appears not running: /bin/sh: 1: kill: No such process
error due a failing liveness probe.
The liveness probe on the task pod checks to see if the PID provided by task agent is running using kill -0 $PID
. Task agent will output its PID to a file used by the liveness probe to confirm task agent is running. This probe may fail if the task agent process fails to start, no longer exists or takes longer to initiate than the liveness probe’s default timeouts. You may adjust the liveness probe defaults in the values.yaml
for container runners’s Helm chart.
In the example below we have set the liveness probe to allow a 2.5 minute startup time before probing(initialDelaySecond
) and 2.5 minutes of failures before the liveness probe fails (the probe will wait 30 seconds between each probe and allow for 5 failure responses before failing the probe) and the task pod is terminated.
agent:
resourceClasses:
<namespace>/<resource-class>:
spec:
containers:
- livenessProbe:
initialDelaySeconds: 150
periodSeconds: 30
timeoutSeconds: 15
successThreshold: 1
failureThreshold: 5
In the event that the liveness probe fails or the task pod terminates, there is a prestop hook, which attempts to kill any existing task agent by its provided PID. This is to ensure there are no orphaned task agent processes. However, if the PID on file does not map to an existing process, this will not throw and error and will instead log PreStop hook: task agent appears never started or already stopped
.
View the logs of a failed task pod
When a task pod fails, it is cleaned up by the container-agent almost immediately. However, sometimes you may want the pod to stick around longer so that you may review the logs and diagnose the failure. You can disable task pod deletion by adding the environment variable KUBE_DISABLE_POD_DELETION_ON_TASK_CLEANUP=true
to your container-agent values.yaml
.
example:
agent:
environment:
KUBE_DISABLE_POD_DELETION_ON_TASK_CLEANUP: true
When KUBE_DISABLE_POD_DELETION_ON_TASK_CLEANUP is set to true, then task pods may dangle until they are manually cleaned or until garbage collection deletes these pods. By default, garbage collection removes resources after they have lived for 5 hours and 5 mins. You may tweak these settings in your container-agent values.yaml . |
Machine runner
The following are errors you could encounter using machine runner.
I installed my first self-hosted runner on macOS and the job is stuck in "Preparing Environment", but there are no errors, what should I do?
In some cases, you may need to update the execution permission for the launch-agent so it is executable by root. Try running the following two commands:
sudo chmod +x /opt/circleci/circleci-launch-agent
sudo /opt/circleci/circleci-launch-agent --config=/Library/Preferences/com.circleci.runner/launch-agent-config.yaml
Cancel the job and rerun it. If your job is still not running, file a support ticket.
Debugging with SSH
CircleCI’s machine runners support rerunning a job with SSH for debugging purposes. Instructions on using this feature can be found at Debugging with SSH.
The Rerun job with SSH feature is disabled by default. To enable this feature, see the machine runner configuration reference or the container runner installation guide. |
OIDC tokens missing from jobs
If you experience that OIDC token environment variables ($CIRCLE_OIDC_TOKEN
, $CIRCLE_OIDC_TOKEN_V2
) are missing from jobs, a common cause can be that the default temporary directory (for example, /tmp
) is not writable or is mounted as noexec
. The system’s temporary directory needs to be writable and allow execution for the OIDC plugin to be downloaded and executed from it.
Releases
Why is my Deployment/Rollout not showing up in the components tab or releases timeline view?
-
Check that the Deployment/Rollout is annotated with the required labels. More information is available in the Set up guides. If the required labels were not present, then adding them should solve the problem.
-
If you are using a Deployment, check that the desired replicas is not set to
0
. Deployments with0
replicas are not reported as releases, even if they are scaled up subsequently. The configured value can be seen on the release agent deployment in thecircleci-release-agent-system
namespace. Here is an example in which the number of desired replicas is 2:apiVersion: apps/v1 kind: Deployment metadata: name: sample-deployment namespace: sample-namespace spec: replicas: 2
-
Check that the Deployment/Rollout is in a namespace managed by the release agent. This can be verified by checking the
MANAGED_NAMESPACES
environment variable on the release agent deployment in thecircleci-release-agent-system
namespace. Here is an example in which only the default namespace is being managed:apiVersion: apps/v1 kind: Deployment metadata: name: circleci-release-agent namespace: circleci-release-agent-system spec: template: spec: containers: - env: - name: MANAGED_NAMESPACES value: default
Why is my release is stuck in the Running
state?
-
If you are using a Deployment, check whether it was deleted before the release could complete. In this scenario, this is an expected behavior. This experience will be improved upon in future release agent updates.
-
If you are using a Deployment, check whether the release agent restarted before all pods for the deployment could become ready. This is a known limitation that will be addressed in future updates of the release agent. Restarting a release agent while a release is ongoing will cause the release agent to lose track of the release status and fail to update the CircleCI services accordingly.
Why are no new releases are showing up? and/or why are component versions not being updated?
-
Check whether the token used by the release agent has been revoked:
-
Select Releases in the CircleCI web app sidebar
-
Select Configure Environments to enter the release environments view
-
Select your environment to view valid token details, including when the token was last used.
If the token has been last used longer than a minute ago, then this is likely to be the problem.
-
-
Check whether tokens are being shared between multiple release environments. This is not supported. Check this by following these steps:
-
Retrieve the token value from the token field in the
circleci-release-agent
secret in thecircleci-release-agent-system
namespace -
Compare the value with the partially obscured value for the available Tokens in the CircleCI web app
If the token does not show up in the list, then it has been revoked or the value configured on the release agent is incorrect. In either case, creating a new token and reinstalling the Release Agent with the new value should solve the issue.
-
Why is restore version
using Helm is timing out?
The time required for a Helm-based restore version to complete successfully is dependent on the specific configuration of the target component. For example, a large number of replicas will lead to a longer duration, which could cause a timeout. It is possible to specify a different timeout by adding the circleci.com/operation-timeout
annotation to the Rollout or Deployment. The default value for this is 10 minutes. For steps see the Configure your Kubernetes components page.
Why is the restore version button not available for a component version?
Check whether the component has been undeployed. If there are currently no live versions for a component, the Restore Version button will not be visible for that component until at least one version has been deployed.
Why are all buttons disabled for a release?
Check whether the release is a Rollback. If this is the case, then this is a known issue that will be solved in a future update to the CircleCI release agent.
Why are all commands for my component failing?
Check if the error message is “invalid or missing project ID“. In this case the component is missing a valid Project ID.