Caching strategies
Caching is one of the most effective ways to make jobs faster on CircleCI. By reusing the data from previous jobs, you also reduce the cost of fetch operations. Caching is project-specific, and using caching strategies helps to optimize caches for effectiveness and storage capacity.
Cache storage customization
When using self-hosted runners, there is a network and storage usage limit included in your plan. Some actions related to artifacts accrue network and storage usage. Once your usage exceeds your limit, charges apply.
Retaining caches for a long period of time will have storage cost implications. It is best to determine why you are retaining caches, and how long caches need to be retained for your use case. To lower costs, consider a lower storage retention for caches, if that suits your needs.
You can customize storage usage retention periods for caches on the CircleCI web app by navigating to . For information on managing network and storage usage, see the Persisting Data page.
Cache optimization
When setting up caches for your projects, the goal is a 10x - 20x ROI (return on investment). This means you are aiming for a situation where the amount of cache restored is 10x - 20x the cache saved. The following tips can help you achieve this.
Avoid strict cache keys
Using cache keys that are too strict can mean that you will get a minimal number of cache hits for a workflow. For example, if you used the key CIRCLE_SHA1 (SHA of the last commit of the current pipeline), this would get matched once for a workflow. Consider using cache keys that are less strict to ensure more cache hits.
Avoid unnecessary workflow reruns
If your project has "flaky tests," workflows are rerun unnecessarily. This uses up your credits and increases your storage usage. To avoid this situation, address flaky tests. For help with identifying them, see Test Insights.
You can also consider configuring your projects to rerun failed jobs rather than entire workflows. To achieve this you can use the when step. For further information, see the Configuration Reference.
Split cache keys by directory
Having multiple directories under a single cache key increases the chance of there being a change to the cache. In the example below, there may be changes in the first two directories but no changes in the a or b directory. Saving all four directories under one cache key increases the potential storage usage. The cache restore step also takes longer than needed as all four sets of files are restored.
- save_cache:
paths:
- /mnt/ramdisk/node_modules
- /mnt/ramdisk/.cache/yarn
- /mnt/ramdisk/apps/a/node_modules
- /mnt/ramdisk/apps/b/node_modules
key: v1-node-{{ checksum "package.json" }}
Combine jobs when possible
As an example, a workflow including three jobs running in parallel:
-
lint (20 seconds)
-
code-cov (30 seconds)
-
test (8 minutes)
All running a similar set of steps:
-
checkout
-
restore cache
-
build
-
save cache
-
run command
The lint and code-cov jobs could be combined with no effect on workflow length, but saving on duplicate steps.
Order jobs to create meaningful workflows
If no job ordering is used in a workflow, all jobs run concurrently. If all the jobs have a save_cache step, they could be uploading files multiple times. Consider reordering jobs in a workflow so subsequent jobs can make use of assets created in previous jobs.
Check for language-specific caching tips
Check partial dependency caching strategies to see if there are tips for the language you are using.
Check cache is being restored as well as saved
If you find that a cache is not restored, see this support article for tips.
Cache unused or superfluous dependencies
Depending on what language and package management system you are using, you may be able to leverage tools that clear or prune unnecessary dependencies.
For example, the node-prune package removes unnecessary files (Markdown, TypeScript files, etc.) from node_modules.
Check if jobs need pruning
If you notice your cache usage is high and would like to reduce it:
-
Search for the
save_cacheandrestore_cachecommands in your.circleci/config.ymlfile to find all jobs utilizing caching and determine if their cache(s) need pruning. -
Narrow the scope of a cache from a large directory to a smaller subset of specific files.
-
Ensure that your cache
keyis following Best Practices:- save_cache: key: brew-{{epoch}} paths: - /Users/distiller/Library/Caches/Homebrew - /usr/local/HomebrewNotice in the above example that best practices are not followed.
brew-{{ epoch }}changes every build, causing an upload every time even if the value has not changed. This costs you money, and never saves you any time. Instead pick a cachekeylike the following:- save_cache: key: brew-{{checksum "Brewfile"}} paths: - /Users/distiller/Library/Caches/Homebrew - /usr/local/HomebrewThis will change if the list of requested dependencies changes. If you find that this is not uploading a new cache often enough, include the version numbers in your dependencies.
-
Let your cache be slightly out of date. In contrast to the suggestion above where we ensured that a new cache would be uploaded any time a new dependency was added to your lockfile or version of the dependency changed, use something that tracks it less precisely.
-
Prune your cache before you upload it, but make sure you prune whatever generates your cache key as well.
Partial dependency caching strategies
Some dependency managers do not properly handle installing on top of partially restored dependency trees.
- restore_cache:
keys:
- gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
- gem-cache-{{ arch }}-{{ .Branch }}
- gem-cache
In the above example, the second or third cache keys may produce a partial restore. Some dependency managers then incorrectly install on top of the outdated dependency tree.
Instead of a cascading fallback, a more stable option is a single version-prefixed cache key:
- restore_cache:
keys:
- v1-gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
Since caches are immutable, this strategy allows you to regenerate all of your caches by incrementing the version, which can be useful in the following scenarios:
-
When you change the version of a dependency manager like
npm. -
When you change the version of a language like Ruby.
-
When you add or remove dependencies from your project.
The stability of partial dependency caching relies on your dependency manager. Below is a list of common dependency managers, recommended partial caching strategies, and associated justifications.
Bundler (Ruby)
Safe to Use Partial Cache Restoration? Yes (with caution).
Since Bundler uses system gems that are not explicitly specified, it is non-deterministic, and partial cache restoration can be unreliable.
To prevent this behavior, add a step that cleans Bundler before restoring dependencies from cache.
steps:
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- v1-gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
- v1-gem-cache-{{ arch }}-{{ .Branch }}-
- v1-gem-cache-{{ arch }}-
- run: bundle install
- run: bundle clean --force
- save_cache:
paths:
- ~/.bundle
key: v1-gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
Gradle (Java)
Safe to Use Partial Cache Restoration? Yes.
Gradle repositories are intended to be centralized, shared, and massive. Partial caches can be restored without impacting which libraries are added to classpaths of generated artifacts.
steps:
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- gradle-repo-v1-{{ .Branch }}-{{ checksum "dependencies.lockfile" }}
- gradle-repo-v1-{{ .Branch }}-
- gradle-repo-v1-
- save_cache:
paths:
- ~/.gradle/caches
- ~/.gradle/wrapper
key: gradle-repo-v1-{{ .Branch }}-{{ checksum "dependencies.lockfile" }}
Maven (Java) and Leiningen (Clojure)
Safe to Use Partial Cache Restoration? Yes.
Maven repositories are intended to be centralized, shared, and massive. Partial caches can be restored without impacting which libraries are added to classpaths of generated artifacts.
Since Leiningen uses Maven under the hood, it behaves in a similar way.
steps:
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- maven-repo-v1-{{ .Branch }}-{{ checksum "pom.xml" }}
- maven-repo-v1-{{ .Branch }}-
- maven-repo-v1-
- save_cache:
paths:
- ~/.m2/repository
key: maven-repo-v1-{{ .Branch }}-{{ checksum "pom.xml" }}
npm (Node)
Safe to Use Partial Cache Restoration? Yes (with NPM5+).
With NPM5+ and a lock file, you can use partial cache restoration. Cache the npm download cache at ~/.npm rather than node_modules directly. The ~/.npm path is the correct cache location regardless of npm version, and works with both npm install and npm ci.
steps:
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- node-v1-{{ .Branch }}-{{ checksum "package-lock.json" }}
- node-v1-{{ .Branch }}-
- node-v1-
- run: npm install
- save_cache:
paths:
- ~/.npm
key: node-v1-{{ .Branch }}-{{ checksum "package-lock.json" }}
|
If your job uses |
pip (Python)
Safe to Use Partial Cache Restoration? Depends on what you cache.
The safety of partial cache restoration for Python depends on whether you cache the download cache or the installed environment:
-
Download cache (
~/.cache/pip): pip stores downloaded wheels here before installing them. Partially restoring this cache is safe because pip always resolves and installs the correct versions from your dependency file regardless of what wheels are already present. A stale partial restore causes some wheels to be re-downloaded. -
Installed environment (a
virtualenvorsite-packagesdirectory): partially restoring an installed environment is unsafe unless you use a lock file that pins every dependency to an exact version. Without pinned versions, pip may leave stale package versions in place rather than upgrading them, producing an environment that does not match a clean install.
The examples below follow this distinction. Bare pip and uv cache the download cache and use fallback keys without risk. Pipenv and Poetry cache the installed environment and rely on their lock files to keep partial restores deterministic.
Pip with requirements.txt
Cache the pip download cache at ~/.cache/pip. Partial restoration is safe here because pip resolves installs from requirements.txt independently of what is already cached.
steps:
- restore_cache:
keys:
- pip-cache-v1-{{ .Branch }}-{{ checksum "requirements.txt" }}
- pip-cache-v1-{{ .Branch }}-
- pip-cache-v1-
- run: pip install -r requirements.txt
- save_cache:
paths:
- ~/.cache/pip
key: pip-cache-v1-{{ .Branch }}-{{ checksum "requirements.txt" }}
Pipenv
Pipenv generates a Pipfile.lock that pins every dependency to an exact version. This makes partial restoration of the installed virtualenv safe, because the lock file ensures pip installs the correct versions even on top of a stale partial restore.
steps:
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- pip-packages-v1-{{ .Branch }}-{{ checksum "Pipfile.lock" }}
- pip-packages-v1-{{ .Branch }}-
- pip-packages-v1-
- run: pipenv install
- save_cache:
paths:
- ~/.local/share/virtualenvs/venv # this path depends on where pipenv creates a virtualenv
key: pip-packages-v1-{{ .Branch }}-{{ checksum "Pipfile.lock" }}
Poetry
Poetry generates a poetry.lock file that pins all dependencies to specific versions, making it a reliable cache key. As with Pipenv, the lock file makes partial restoration of the installed virtualenvs directory safe.
steps:
- restore_cache:
keys:
- poetry-v1-{{ .Branch }}-{{ checksum "poetry.lock" }}
- poetry-v1-{{ .Branch }}-
- poetry-v1-
- run: poetry install --no-interaction
- save_cache:
paths:
- ~/.cache/pypoetry/virtualenvs
key: poetry-v1-{{ .Branch }}-{{ checksum "poetry.lock" }}
uv
uv is a fast Python package installer that maintains its own download cache at ~/.cache/uv. Like bare pip, this is a download cache rather than an installed environment, so partial restoration is safe.
steps:
- restore_cache:
keys:
- uv-cache-v1-{{ .Branch }}-{{ checksum "uv.lock" }}
- uv-cache-v1-{{ .Branch }}-
- uv-cache-v1-
- run: uv sync --frozen
- save_cache:
paths:
- ~/.cache/uv
key: uv-cache-v1-{{ .Branch }}-{{ checksum "uv.lock" }}
|
|
Yarn (Node)
Safe to Use Partial Cache Restoration? Yes.
Yarn has always used a lock file for the reasons explained above.
steps:
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- yarn-packages-v1-{{ .Branch }}-{{ checksum "yarn.lock" }}
- yarn-packages-v1-{{ .Branch }}-
- yarn-packages-v1-
- run: yarn --frozen-lockfile --cache-folder ~/.cache/yarn
- save_cache:
paths:
- ~/.cache/yarn
key: yarn-packages-v1-{{ .Branch }}-{{ checksum "yarn.lock" }}
We recommend using yarn --frozen-lockfile --cache-folder ~/.cache/yarn for two reasons:
-
--frozen-lockfileensures a whole new lockfile is created and it also ensures your lockfile is not altered. This allows for the checksum to stay relevant and your dependencies will identically match what you use in development. -
The default cache location depends on OS.
--cache-folder ~/.cache/yarnensures you are explicitly matching your cache save location.
Caching strategy tradeoffs
In cases where the build tools for your language include elegant handling of dependencies, partial cache restores may be preferable to zero cache restores for performance reasons. If you get a zero cache restore, you have to reinstall all your dependencies, which can cause reduced performance. One alternative is to get a large percentage of your dependencies from an older cache, instead of starting from zero.
However, for other language types, partial caches carry the risk of creating code dependencies that are not aligned with your declared dependencies. These mismatches do not always surface until you run a build without a cache. If the dependencies change infrequently, consider listing the zero cache restore key first, and then track the costs over time.
If the performance costs of zero cache restores (also referred to as a cache miss) prove significant over time, only then consider adding a partial cache restore key.
Listing multiple keys for restoring a cache increases the chances of a partial cache hit. However, broadening your restore_cache scope to a wider history increases the risk of confusing failures. For example, consider a project with Node v6 on an upgrade branch and Node v5 on all other branches. A restore_cache step that searches other branches may restore incompatible dependencies.
Using a lock file
Language dependency manager lockfiles (for example, Gemfile.lock or yarn.lock) checksums may be a useful cache key.
An alternative is to run the command ls -laR your-deps-dir > deps_checksum and reference it with {{ checksum "deps_checksum" }}. For example, in Python, install the dependencies within a virtualenv in the project root venv. Then run ls -laR venv > python_deps_checksum to produce a more specific cache key than the checksum of your requirements.txt file.
Using multiple caches for different languages
It is also possible to lower the cost of a cache miss by splitting your job across multiple caches. By specifying multiple restore_cache steps with different keys, each cache is reduced in size, thereby reducing the performance impact of a cache miss.
Consider splitting caches by language type (npm, pip, or bundler), if you know the following:
-
How each dependency manager stores its files.
-
How it upgrades.
-
How it checks dependencies.
Caching expensive steps
Certain languages and frameworks include more expensive steps that are worth caching. Scala and Elixir are two examples where caching the compilation steps is effective. Rails developers also see a performance boost from caching frontend assets.
Do not cache everything, but do consider caching for costly steps like compilation.