Caching Dependencies

Caching is one of the most effective ways to make jobs faster on CircleCI. Automatic dependency caching is not available in CircleCI 2.0, so it is important to plan and implement your caching strategy to get the best performance. Manual configuration in 2.0 enables more advanced strategies and finer control. However, the keys are simple to configure, for example, updating a cache if it changes, by using checksum of pom.xml with a cascading fallback:

      - restore_cache:
           - m2-{{ checksum "pom.xml" }}
           - m2- # used if checksum fails

This document describes the manual caching available, the costs and benefits of a chosen strategy, and tips for avoiding problems with caching. Note: The Docker images used for CircleCI 2.0 job runs are automatically cached on the server infrastructure where possible.

For information about enabling a premium feature to reuse the unchanged layers of your Docker image, see the Enabling Docker Layer Caching document.


A cache stores a hierarchy of files under a key. Use the cache to store data that makes your job faster, but in the case of a cache miss or zero cache restore the job will still run successfully, for example, by caching Npm, Gem, or Maven package directories.

Caching is a balance between reliability (not using an out-of-date or inappropriate cache) and getting maximum performance (using a full cache for every build).

In general it is safer to preserve reliability than to risk a corrupted build or to build using stale dependencies very quickly. So, the ideal is to balance performance gains while maintaining high reliability.

Caching Libraries

The dependencies that are most important to cache during a job are the libraries on which your project depends. For example, cache the libraries that are installed with pip in Python or npm for Node.js. The various language dependency managers, for example npm or pip, each have their own paths where dependencies are installed. See our Language guides and demo projects for the specifics for your stack:

Tools that are not explicitly required for your project are best stored on the Docker image. The Docker image(s) pre-built by CircleCI have tools preinstalled that are generic for building projects using the language the image is focused on. For example the circleci/ruby:2.4.1 image has useful tools like git, openssh-client, and gzip preinstalled.

Writing to the Cache in Workflows

Jobs in one workflow can share caches. Note that this makes it possibile to create race conditions in caching across different jobs in workflows.

Cache is immutable on write: once a cache is written for a particular key like node-cache-master, it cannot be written to again. Consider a workflow of 3 jobs, where Job3 depends on Job1 and Job2: {Job1, Job2} -> Job3. They all read and write to the same cache key.

In a run of the workflow, Job3 may use the cache written by Job1 or Job2. Since caches are immutable, this would be whichever job saved its cache first. This is usually undesireable because the results aren’t deterministic–part of the result depends on chance. You could make this workflow deterministic by changing the job dependencies: make Job1 and Job2 write to different caches and Job3 loads from only one, or ensure there can be only one ordering: Job1 -> Job2 ->Job3.

There are more complex cases, where jobs can save using a dynamic key like node-cache- and restore using a partial key match like node-cache-. The possibility for a race condition still exists, but the details may change. For instance, the downstream job uses the cache from the upstream job to run last.

Another race condition is possible when sharing caches between jobs. Consider a workflow with no dependency links: Job1 and Job2. Job2 uses the cache saved from Job1. Job2 could sometimes successfully restore a cache, and sometimes report no cache is found, even when Job1 reports saving it. Job2 could also load a cache from a previous workflow. If this happens, this means Job2 tried to load the cache before Job1 saved it. This can be resolved by creating a workflow dependency: Job1 -> Job2. This would force Job2 to wait until Job1 has finished running.

Restoring Cache

To decide how to save your cache, it is useful to first understand that CircleCI selects what will be restored in the order in which they are listed in the special restore_cache step. Each cache key is namespaced to the project and retrieval is prefix-matched. As caches become less specific going down the list in the following example, there is greater likelihood that the dependencies they contain are different from those that the current job requires. When your dependency tool runs (for example, npm install) it will discover out-of-date dependencies and install those the current job specifies. This is also referred to as partial cache restore.

Here’s another example of a restore_cache step with two keys:

      - restore_cache:
            # Find a cache corresponding to this specific package.json checksum
            # when this file is changed, this key will fail
            - v1-npm-deps-{{ checksum "package.json" }}
            # Find the most recent cache used from any branch
            - v1-npm-deps

Clearing Cache

If you need to get clean caches when your language or dependency management tool versions change, use a naming strategy similar to the previous example.

For example:

  • Dependency manager version change, for example, you change npm from 4 to 5
  • Language version change, for example, you change ruby 2.3 to 2.4
  • Dependencies are removed from your project

Basic Example of Dependency Caching

The extra control and power in CircleCI 2.0 manual dependency caching requires that you be explicit about what you cache and how you cache it. See the save cache section of the Writing Jobs and Steps document for additional examples.

To save a cache of a file or directory, add the save_cache step to a job in your .circleci/config.yml file:

      - save_cache:
          key: my-cache
            - my-file.txt
            - my-project/my-dependencies-directory

The path for directories is relative to the working_directory of your job. You can specify an absolute path if you choose.

Using Keys and Templates

While choosing suitable templates for your cache key, keep in mind that cache saving is not a free operation, it will take some time to upload the cache to CircleCI storage. To avoid generating a new cache every build, have a key that generates a new cache only if something actually changes.

The first step is to decide when a cache will be saved or restored by using a key for which some value is an explicit aspect of your project. For example, when a build number increments, when a revision is incremented, or when the hash of a dependency manifest file changes.

Following are some examples of caching strategies for different goals:

  • myapp-{{ checksum "package.json" }} - Cache will be regenerated every time something is changed in package.json file, different branches of this project will generate the same cache key.
  • myapp-{{ .Branch }}-{{ checksum "package.json" }} - Cache will be regenerated every time something is changed in package.json file, different branches of this project will generate the separate cache keys.
  • myapp-{{ epoch }} - Every build will generate separate cache keys.

During step execution, the templates above will be replaced by runtime values and use the resultant string as the key. The following table describes the available cache key templates:

Template Description
{{ .Branch }} The VCS branch currently being built.
{{ .BuildNum }} The CircleCI build number for this build.
{{ .Revision }} The VCS revision currently being built.
{{ .Environment.variableName }} The environment variable variableName (supports any environment variable exported by CircleCI or added to a specific Context—not any arbitrary environment variable).
{{ checksum "filename" }} A base64 encoded SHA256 hash of the given filename’s contents, so that a new cache key is generated if the file changes. This should be a file committed in your repo. Consider using dependency manifests, such as package.json, pom.xml or project.clj. The important factor is that the file does not change between restore_cache and save_cache, otherwise the cache will be saved under a cache key that is different from the file used at restore_cache time.
{{ epoch }} The number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), also known as POSIX or Unix epoch.
{{ arch }} The OS and CPU information. Useful when caching compiled binaries that depend on OS and CPU architecture, for example, darwin amd64 versus linux i386/32-bit.

Note: When defining a unique identifier for the cache, be careful about overusing template keys that are highly specific such as {{ epoch }}. If you use less specific template keys such as {{ .Branch }} or {{ checksum "filename" }}, you’ll increase the odds of the cache being used. But, there are tradeoffs as described in the following section.

Full Example of Saving and Restoring Cache

The following example demonstrates how to use restore_cache and save_cache together with templates and keys in your .circleci/config.yml file.

      - image: customimage/ruby:2.3-node-phantomjs-0.0.1
          RAILS_ENV: test
          RACK_ENV: test
      - image: circleci/mysql:5.6

      - checkout
      - run: cp config/{database_circleci,database}.yml

      # Run bundler
      # Load installed gems from cache if possible, bundle install then save cache 
      # Multiple caches are used to increase the chance of a cache hit
      - restore_cache:
            - gem-cache-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
            - gem-cache-{{ .Branch }}
            - gem-cache
      - run: bundle install --path vendor/bundle
      - save_cache:
          key: gem-cache-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
            - vendor/bundle

      - run: bundle exec rubocop
      - run: bundle exec rake db:create db:schema:load --trace
      - run: bundle exec rake factory_girl:lint

      # Precompile assets
      # Load assets from cache if possible, precompile assets then save cache
      # Multiple caches are used to increase the chance of a cache hit
      - restore_cache:
            - asset-cache-{{ .Branch }}-{{ checksum "VERSION" }}
            - asset-cache-{{ .Branch }}
            - asset-cache
      - run: bundle exec rake assets:precompile
      - save_cache:
          key: asset-cache-{{ .Branch }}-{{ checksum "VERSION" }}
            - public/assets
            - tmp/cache/assets/sprockets
      - run: bundle exec rspec
      - run: bundle exec cucumber

Caching Strategy Tradeoffs

In cases where the build tools for your language include elegant handling of dependencies, partial cache restores may be preferable to zero cache restores for performance reasons. If you get a zero cache restore, you have to reinstall all of your dependencies, which can result in reduced performance. One alternative is to get a large percentage of your dependencies from an older cache instead of starting from zero.

However, for other types of languages, partial caches carry the risk of creating code dependencies that are not aligned with your declared dependencies and do not break until you run a build without a cache. If the dependencies change infrequently, consider listing the zero cache restore key first. Then, track the costs over time. If the performance costs of zero cache restores (also referred to as a cache miss) prove to be significant over time, only then consider adding a partial cache restore key.

Listing multiple keys for restoring a cache increases the odds of a partial cache hit. However, broadening your restore_cache scope to a wider history increases the risk of confusing failures. For example, if you have dependencies for Node v6 on an upgrade branch, but your other branches are still on Node v5, a restore_cache step that searches other branches might restore incompatible dependencies.

Using a Lock File

Language dependency manager lockfiles (for example, Gemfile.lock or yarn.lock) may be a useful cache key.

An alternative is to do ls -laR your-deps-dir > deps_checksum and reference it with {{ checksum "deps_checksum" }}. For example, in Python, to get a more specific cache than the checksum of your requirements.txt file you could install the dependencies within a virtualenv in the project root venv and then do ls -laR venv > python_deps_checksum.

Using Multiple Caches For Different Languages

It is also possible to lower the cost of a cache miss by splitting your job across multiple caches. By specifying multiple restore_cache steps with different keys, each cache is reduced in size thereby reducing the performance impact of a cache miss. Consider splitting caches by language type (npm, pip, or bundler) if you know how each dependency manager stores its files, how it upgrades, and how it checks dependencies.

Caching Expensive Steps

Certain languages and frameworks have more expensive steps that can and should be cached. Scala and Elixir are two examples where caching the compilation steps will be especially effective. Rails developers, too, would notice a performance boost from caching frontend assets.

Do not cache everything, but do consider caching for costly steps like compilation.