Troubleshoot

3 months ago · 22 min read

Cloud Server v4+

Contribute

This page offers troubleshooting suggestions for the following aspects of CircleCI:

Orbs

Why do I receive an error message when trying to use an uncertified orb?

To enable usage of uncertified orbs, go to your organization’s settings page, and click the Security tab. Then, click yes to enable Allow Uncertified Orbs.

Uncertified orbs are not tested or verified by CircleCI. Currently, only orbs created by CircleCI are considered certified. Any other orbs, including partner orbs, are not certified.

Why do I get the following error when testing locally?

Command:

circleci build -c .circleci/jobs.yml --job test

Error:

Error:
You attempted to run a local build with version 2.1 of configuration.

To resolve this error, run circleci config process on your configuration and then save that configuration to disk. You then should run circleci local execute against the processed configuration.

I receive an error when attempting to claim a namespace or publish a production orb.

You may not be an organization owner/admin.

Organizations can only claim a single namespace. In order to claim a namespace for an organization the authenticating user must have owner/admin privileges within the organization.

If you do not have the required permission level you might see an error similar to below:

Error: Unable to find organization YOUR_ORG_NAME of vcs-type GITHUB: Must have member permission.: the organization 'YOUR_ORG_NAME' under 'GITHUB' VCS-type does not exist. Did you misspell the organization or VCS?

Read more in the Orb CLI permissions matrix.

Pipelines

Config could not be located error

If you see the following error message, check the steps below to remediate the issue.

config file .circleci/sample-filename.yml could not be located on branch sample-branch-name in repository sample-repo-name

Ensure that there is a CircleCI configuration file in the repository on the branch that uses the filename specified in the error message. If there is not one present, add a CircleCI configuration file.
If there is a config file present:
1. Navigate to Project Settings Pipelines or Project Settings Project Setup in the CircleCI web app for the project where you are seeing this error message.
2. Ensure that the "Config File Path" field matches the file path of the config file that is in your repository. If you changed the name of the config file in your repository, the reference to that file path must also be changed for any pipeline that uses that configuration file.

Why is my scheduled pipeline not running?

If your scheduled pipeline is not running, verify the following things:

Is the actor who is set for the schedule trigger still part of the organization? You can find this setting in the Pipeline attribution section for your trigger in the CircleCI web app.
Does the branch set for the trigger still exist?
Is your VCS organization using SAML protection? SAML tokens expire often, which can cause requests to fail.

Why are my jobs not running when I push commits?

In the CircleCI application, check the individual job and workflow views for error messages. More often than not, the error is because of formatting errors in your .circleci/config.yml file.

See the YAML Introduction page for more details.

After checking your .circleci/config.yml for formatting errors, search for your issue in the CircleCI support center.

Why is my job queued?

A job might end up being queued because of a concurrency limit being imposed due your organization’s plan. If your jobs are queuing often, you can consider upgrading your plan.

Why are my jobs queuing even though I am on the Performance Plan?

In order to keep the system stable for all CircleCI customers, we implement different soft concurrency limits on each of the Resource classes. If you are experiencing queuing on your jobs, it is possible you are hitting these limits. Contact CircleCI support to request raises on these limits.

Why can I not find my project on the Projects dashboard?

If you do not see a project you would like to build, check your org in the top left corner of the CircleCI application. For instance, if the top left shows your user my-user, only projects belonging to my-user will be available under Projects. If you want to build the project your-org/project, you must switch your organization on the application’s organization switcher menu to your-org.

How do Docker image names work? Where do they come from?

CircleCI currently supports pulling (and pushing with Docker Engine) Docker images from Docker Hub. For official images, you can pull by simply specifying the name of the image and a tag:

golang:1.7.1-jessie
redis:3.0.7-jessie

For public images on Docker Hub, you can pull the image by prefixing the account or team username:

my-user/couchdb:1.6.1

What is the best practice for specifying image versions?

It is best practice not to use the latest tag for specifying image versions. It is also best practice to use a specific version and tag, for example cimg/ruby:3.0.4-browsers. Using a specific version and tag pins down the image and prevents upstream changes to your containers when the underlying base distribution changes. For example, specifying only cimg/ruby:3.0.4 could result in unexpected changes from browsers to node. For more context, refer to Docker image best practices, and CircleCI image best practices.

How can I set the timezone in Docker images?

You can set the timezone in Docker images with the TZ environment variable. A sample .circleci/config.yml with a defined TZ variable would look like the following:

version: 2.1
jobs:
  build:
    docker:
      - image: your/primary-image:version-tag
        auth:
          username: mydockerhub-user
          password: $DOCKERHUB_PASSWORD  # context / project UI env-var reference
      - image: mysql:5.7
        auth:
          username: mydockerhub-user
          password: $DOCKERHUB_PASSWORD  # context / project UI env-var reference
        environment:
           TZ: "America/Los_Angeles"
    working_directory: ~/your-dir
    environment:
      TZ: "America/Los_Angeles"

In this example, the timezone is set for both the primary image and an additional mySQL image.

A full list of available timezone options is available on Wikipedia.

Container runner

The following are errors you could encounter using container runner.

Container fails to start due to disk space

The task remains in the Preparing Environment step while the pod has a warning attached, noting that volume mounting fails due to a lack of disk space.

Events:
  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    67s   default-scheduler  Successfully assigned default/ccita-62e94fd3faccc34751f72803-0-7hrpk8xv to node3
  Warning  FailedMount  68s   kubelet            MountVolume.SetUp failed for volume "kube-api-access-52lfn" : write /var/snap/microk8s/common/var/lib/kubelet/pods/4cd5057f-df97-41c4-b5ef-b632ce74bf45/volumes/kubernetes.io~projected/kube-api-access-52lfn/..2022_08_02_16_24_55.1533247998/ca.crt: no space left on device

You should ensure there is sufficient disk space.

Pod host node runs out of memory

If the node a pod is hosted on runs out of memory, the task will fail with a failure step named Runner Instance Failure, and a message:

could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 137.

The pod will have a status of OOMKilled when viewed in Kubernetes with kubectl. You can use task pod configuration to control memory allocation for the job itself.

Pod host node is out of disk space

If the node is full it will have a node.kuberenetes.io/disk-pressure taint, which will prevent new task pods from being scheduled. If all valid nodes for the pod have the same taint, or other conditions that prevent scheduling, the task pod will sit in a pending state until an untainted valid node becomes available. This will show the job as stuck in the Preparing Environment step in the UI.

You need to scale your cluster more effectively to avoid this state.

The node a task is running on abruptly dies

When container runner is hosted on a separate node, the task will still look like it is running in the CircleCI UI until there is a timeout for it. kubectl will also still show the pod as running until the cluster’s liveness probe timeout is hit. The pod will then enter a terminating state that it will become wedged in. At this point the pod will need to be forcefully removed. If force is not used it may cause kubectl to hang:

kubectl delete pod $POD_NAME --force

Image has a bad entrypoint

If the entrypoint specified for the image is invalid, the task will fail with an error:

could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 139.

Container runner and CircleCI cloud set the entrypoint of the primary container in different ways. On cloud, the entrypoint of the primary container is ignored unless it is preserved using the com.circleci.preserve-entrypoint=true LABEL instruction (see: Adding an entrypoint). In contrast, container runner will always default to a shell (/bin/sh), or the entrypoint specified in the job configuration, if set.

Entrypoints should be commands that run forever without failing. If the entrypoint fails or terminates in the middle of a build, the build will also terminate. If you need to access logs or build status, consider using a background step instead of an entrypoint.

Specify an entrypoint using the Adding an entrypoint documentation to mitigate this error. You can set the entrypoint explicitly as described in Using custom built Docker images.

Image is for a different architecture

If an image for a job uses a different architecture than the node it is deployed on, container runner will give an error:

19:30:12 eb1a4 11412.984ms service-work error=1 error occurred:
        * could not start task containers: pod failed to start: :

The task pod will also show an error status. This will show as a failed job in the CircleCI UI with the error:

could not start task containers: pod failed to start: :

You should correct the underlying architecture used for nodes with jobs to match the architecture for images being used by jobs.

Bad task pod configuration

If the task pod for a resource class is misconfigured, the task will fail once claimed. In the UI the error will be in a Runner Instance Failure step with a message resembling:

could not start task containers: error creating task pod: Pod "ccita-62ea7dff36e977580a329a9d-0-uzz1y8xi" is invalid: [spec.containers[0].resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers[0].resources.limits[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers, spec.containers[0].resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource type or fully qualified, spec.containers[0].resources.requests[eppemeral-storage]: Invalid value: "eppemeral-storage": must be a standard resource for containers]

No pod has been created in the Kubernetes cluster. You will need to correct the task pod configuration as described on the Container runner page.

Bash missing

"could not start task containers: exec into build container "container-0" failed: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "bb04485b9ef2386dee5e44a92bfe512ed786675611b6a518c3d94c1176f9a8aa": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown"

Bash is required for custom images used in jobs executed with a container runner.

Oops, there was an issue with your infrastructure

If you see the message "Oops, there was an issue with your infrastructure. Verify your self-hosted runner infrastructure is operating and try re-running the job. If the issue persist, see our troubleshooting guide" on the job’s page, or if there is no content in the task lifecycle step (as shown), you should consider the potential causes described below:

Image showing the task lifecycle with no content message

Figure 1. Task lifecycle with no content

Pod restart: Check if there were any container agent pod restarts around the time the workflow ran. If the pod was restarted around that time it would have resulted in the job not being processed. In such a case, we recommend rerunning the job once again.

You can check the logs for any of the previous runs using the command kubectl logs -n <namespace> <full pod name> --previous `.

Network connectivity issue: Check the network connectivity of the container agent, especially if the issue is intermittent. The issue can be seen when the container agent has lost network connectivity after claiming the tasks.

We suggest connecting to the pod using the command kubectl exec --stdin --tty -n circleci < full pod name > — /bin/sh and then running a ping test for an extended period of time. We also recommend checking the connection to the links on our FAQ includes a section about the connectivity required for CircleCI’s self-hosted runners.

Resources exhaustion: Check if your pods are reaching their resources limits in the cluster, as the pod could end the job to free up resources. We recommend setting resource limits either within your values.yaml or within your config.yaml.

Refer to the Kubernetes documentation for details of external resource monitoring tools.

Task agent appears not running

The task pod may fail with an Task agent appears not running: /bin/sh: 1: kill: No such process error due a failing liveness probe.

The liveness probe on the task pod checks to see if the PID provided by task agent is running using kill -0 $PID. Task agent will output its PID to a file used by the liveness probe to confirm task agent is running. This probe may fail if the task agent process fails to start, no longer exists or takes longer to initiate than the liveness probe’s default timeouts. You may adjust the liveness probe defaults in the values.yaml for container runners’s Helm chart.

In the example below we have set the liveness probe to allow a 2.5 minute startup time before probing(initialDelaySecond) and 2.5 minutes of failures before the liveness probe fails (the probe will wait 30 seconds between each probe and allow for 5 failure responses before failing the probe) and the task pod is terminated.

agent:
  resourceClasses:
    <namespace>/<resource-class>:
      spec:
        containers:
          - livenessProbe:
              initialDelaySeconds: 150
              periodSeconds: 30
              timeoutSeconds: 15
              successThreshold: 1
              failureThreshold: 5

In the event that the liveness probe fails or the task pod terminates, there is a prestop hook, which attempts to kill any existing task agent by its provided PID. This is to ensure there are no orphaned task agent processes. However, if the PID on file does not map to an existing process, this will not throw and error and will instead log PreStop hook: task agent appears never started or already stopped.

View the logs of a failed task pod

When a task pod fails, it is cleaned up by the container-agent almost immediately. However, sometimes you may want the pod to stick around longer so that you may review the logs and diagnose the failure. You can disable task pod deletion by adding the environment variable KUBE_DISABLE_POD_DELETION_ON_TASK_CLEANUP=true to your container-agent values.yaml.

example:

agent:
  environment:
    KUBE_DISABLE_POD_DELETION_ON_TASK_CLEANUP: true

When KUBE_DISABLE_POD_DELETION_ON_TASK_CLEANUP is set to true, then task pods may dangle until they are manually cleaned or until garbage collection deletes these pods. By default, garbage collection removes resources after they have lived for 5 hours and 5 mins. You may tweak these settings in your container-agent values.yaml.

Machine runner

The following are errors you could encounter using machine runner.

I installed my first self-hosted runner on macOS and the job is stuck in "Preparing Environment", but there are no errors, what should I do?

In some cases, you may need to update the execution permission for the launch-agent so it is executable by root. Try running the following two commands:

sudo chmod +x /opt/circleci/circleci-launch-agent
sudo /opt/circleci/circleci-launch-agent --config=/Library/Preferences/com.circleci.runner/launch-agent-config.yaml

Cancel the job and rerun it. If your job is still not running, file a support ticket.

Debugging with SSH

CircleCI’s machine runners support rerunning a job with SSH for debugging purposes. Instructions on using this feature can be found at Debugging with SSH.

The Rerun job with SSH feature is disabled by default. To enable this feature, see the machine runner configuration reference or the container runner installation guide.

OIDC tokens missing from jobs

If you experience that OIDC token environment variables ($CIRCLE_OIDC_TOKEN, $CIRCLE_OIDC_TOKEN_V2) are missing from jobs, a common cause can be that the default temporary directory (for example, /tmp) is not writable or is mounted as noexec. The system’s temporary directory needs to be writable and allow execution for the OIDC plugin to be downloaded and executed from it.

Releases

Why is my Deployment/Rollout not showing up in the components tab or releases timeline view?

Check that the Deployment/Rollout is annotated with the required labels. More information is available in the Set up guides. If the required labels were not present, then adding them should solve the problem.
If you are using a Deployment, check that the desired replicas is not set to 0. Deployments with 0 replicas are not reported as releases, even if they are scaled up subsequently. The configured value can be seen on the release agent deployment in the circleci-release-agent-system namespace. Here is an example in which the number of desired replicas is 2:
```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-deployment
  namespace: sample-namespace
spec:
  replicas: 2
```
Check that the Deployment/Rollout is in a namespace managed by the release agent. This can be verified by checking the MANAGED_NAMESPACES environment variable on the release agent deployment in the circleci-release-agent-system namespace. Here is an example in which only the default namespace is being managed:
```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: circleci-release-agent
  namespace: circleci-release-agent-system
spec:
  template:
    spec:
      containers:
      - env:
        - name: MANAGED_NAMESPACES
          value: default
```

Why is my release is stuck in the `Running` state?

If you are using a Deployment, check whether it was deleted before the release could complete. In this scenario, this is an expected behavior. This experience will be improved upon in future release agent updates.
If you are using a Deployment, check whether the release agent restarted before all pods for the deployment could become ready. This is a known limitation that will be addressed in future updates of the release agent. Restarting a release agent while a release is ongoing will cause the release agent to lose track of the release status and fail to update the CircleCI services accordingly.

Why are no new releases are showing up? and/or why are component versions not being updated?

Check whether the token used by the release agent has been revoked:
1. Select Releases in the CircleCI web app sidebar
2. Select Configure Environments to enter the release environments view
3. Select your environment to view valid token details, including when the token was last used.
  
  If the token has been last used longer than a minute ago, then this is likely to be the problem.
Check whether tokens are being shared between multiple release environments. This is not supported. Check this by following these steps:
1. Retrieve the token value from the token field in the circleci-release-agent secret in the circleci-release-agent-system namespace
2. Compare the value with the partially obscured value for the available Tokens in the CircleCI web app
  
  If the token does not show up in the list, then it has been revoked or the value configured on the release agent is incorrect. In either case, creating a new token and reinstalling the Release Agent with the new value should solve the issue.

Why is `restore version` using Helm is timing out?

The time required for a Helm-based restore version to complete successfully is dependent on the specific configuration of the target component. For example, a large number of replicas will lead to a longer duration, which could cause a timeout. It is possible to specify a different timeout by adding the circleci.com/operation-timeout annotation to the Rollout or Deployment. The default value for this is 10 minutes. For steps see the Configure your Kubernetes components page.

Why is the restore version button not available for a component version?

Check whether the component has been undeployed. If there are currently no live versions for a component, the Restore Version button will not be visible for that component until at least one version has been deployed.

Why are all buttons disabled for a release?

Check whether the release is a Rollback. If this is the case, then this is a known issue that will be solved in a future update to the CircleCI release agent.

Why are all commands for my component failing?

Check if the error message is “invalid or missing project ID“. In this case the component is missing a valid Project ID.

Troubleshoot

Orbs

Why do I receive an error message when trying to use an uncertified orb?

Why do I get the following error when testing locally?

I receive an error when attempting to claim a namespace or publish a production orb.

Pipelines

Config could not be located error

Why is my scheduled pipeline not running?

Why are my jobs not running when I push commits?

Why is my job queued?

Why are my jobs queuing even though I am on the Performance Plan?

Why can I not find my project on the Projects dashboard?

How do Docker image names work? Where do they come from?

What is the best practice for specifying image versions?

How can I set the timezone in Docker images?

Container runner

Container fails to start due to disk space

Pod host node runs out of memory

Pod host node is out of disk space

The node a task is running on abruptly dies

Image has a bad entrypoint

Image is for a different architecture

Bad task pod configuration

Bash missing

Oops, there was an issue with your infrastructure

Task agent appears not running

View the logs of a failed task pod

Machine runner

I installed my first self-hosted runner on macOS and the job is stuck in "Preparing Environment", but there are no errors, what should I do?

Debugging with SSH

OIDC tokens missing from jobs

Releases

Why is my Deployment/Rollout not showing up in the components tab or releases timeline view?

Why is my release is stuck in the Running state?

Why are no new releases are showing up? and/or why are component versions not being updated?

Why is restore version using Helm is timing out?

Why is the restore version button not available for a component version?

Why are all buttons disabled for a release?

Why are all commands for my component failing?

Why is my release is stuck in the `Running` state?

Why is `restore version` using Helm is timing out?