Engineering ProductivityOct 27, 20215 min read

Config best practices: dependency caching

Jacob Schmitt

Senior Technical Content Marketing Manager

A computer screen overlaid on a grid background shows an abstract file containing a gear icon.

Let’s face it: Creating the optimal CI/CD workflow is not always a simple task. In fact, writing effective and efficient configuration code is the biggest hurdle that many developers face in their DevOps journey. But you don’t need to be an expert to set up a fast, reliable testing and deployment infrastructure. With a few straightforward techniques, you can optimize your config.yml file and unleash the full potential of your CI/CD pipelines.

In this series, we go into depth on some common recommendations that our solutions engineers make during one-on-one config reviews with enterprise-level customers. Today, we are focused on dependency caching, a powerful technique for saving build data between jobs and speeding up your workflows.

What is dependency caching?

In general computing terms, caching is a process in which frequently used data is stored in memory so that it can be quickly retrieved for future use. In a CI/CD pipeline, you often need to install the same dependencies—libraries or packages that are required for your application to run—during the build stage. To avoid repeatedly downloading the same dependencies each time you run your workflows, you can use dependency caching to save those packages and make them available in future jobs.

Caching your dependencies between jobs can trim seconds or even minutes from your builds. Improving build speed leads to fewer wasted minutes and faster feedback from your tests, allowing you and your team to ship changes to your users quickly and efficiently.

When should you use dependency caching?

Caching dependencies is one of the most common recommendations we make to users looking to speed up their workflows. This technique is particularly useful for projects that rely on package managers, such as npm and Yarn for Node.js, pip for Python, and Bundler for Ruby, to install multiple binaries during a build process.

For example, a full-stack web app written with React and Node.js can quickly rack up tens, hundreds, or even thousands of dependencies and sub-dependencies. Without dependency caching, the following config.yml file would download every package required by the application each time the build_and_test job ran:

jobs:
  build_and_test:
    docker:
      - image: cimg/node:16.11.1
    steps:
      - checkout
      # install dependencies
      - run:
          name: install dependencies
          command: npm install
      # run test suite	
      - run:
          name: test
          command: npm run test

This config lacks dependency caching. It sets CircleCI’s Node.js Docker image as the execution environment and checks out the application code from the associated Git repository before installing the necessary dependencies and running the tests defined for the project. Because dependency caching is not set up, the build_and_test job will start from a clean slate on every workflow run, unnecessarily installing the same dependencies over and over.

How to add dependency caching to your pipelines

Adding a dependency cache to your CircleCI workflow is as simple as setting up restore_cache and save_cache steps along with unique identifiers, or keys, for each version of the cache that you create. Here is the same config you saw above, this time optimized with dependency caching:

...
jobs:
  build_and_test:
    docker:
      - image: cimg/node:16.11.1
    steps:
      - checkout
      # look for existing cache and restore if found
      - restore_cache:
          key: v1-deps-{{ checksum "package-lock.json" }}
      # install dependencies    
      - run:
          name: install dependencies
          command: npm install
      # save any changes to the cache
      - save_cache:
          key: v1-deps-{{ checksum "package-lock.json" }}
          paths: 
            - node_modules   
      # run test suite
      - run:
          name: test
          command: npm run test

This config now includes dependency caching. The restore_cache step checks for an existing cache and, if found, restores it to the job. In this case, npm install will install only those dependencies that are not in the cache. Then, after all of the project’s dependencies have been loaded, the save_cache step will save the updated dependency tree to a new cache in the node_modules directory.

Note that both restore_cache and save_cache include key identifiers. The {{ checksum "package-lock.json" }} part of the key is a dynamic value known as a template. This particular template calculates a SHA256 hash of the contents of the project’s package-lock.json file and prepends v1-deps- to the result. If package-lock.json (or any dependency management file you have specified in your cache-key) changes, then restore_cache will miss and a new cache will be created by save_cache.

There are several other important things to note about this example and dependency caching in general:

  • Caches are immutable. Once save_cache writes to a given key, it cannot be overwritten.

  • It is important to include the save_cache step before you run your tests so that your dependencies will be saved even if your tests fail.

  • You can use other dynamic information in your templates besides the checksum value of your dependency management file, including the VCS branch being built, the CircleCI job number, and epoch time of the build. For more information, go to the documentation on using keys and templates.

Finally, if only a few dependencies have changed but the rest of the cache is valid, it is possible to restore part of a cache by setting a fallback key:

- restore_cache:
        keys:
          - v1-deps-{{ checksum "package-lock.json" }}
          - v1-deps-

In this example, CircleCI will first try to load a cache associated with the current version of package-lock.json. If the lock file has changed due to a dependency being added, then no cache will be found. Next, CircleCI will use the static fallback key v1-deps- to load the most recent valid cache with the v1-deps- prefix. Once that previous cache has been loaded, npm install will download any missing dependencies.

For more information on partial cache restoration, check out the documentation.

Conclusion

Dependency caching is one of the most straightforward and effective ways to optimize your CircleCI config. With a few small adjustments to your workflow, you can save significant amounts of time on your builds, allowing you to focus on doing what you do best: delivering value to your users, quickly.

Dependency caching is only one of many different optimizations you can make to your config. Other cache-based optimizations include persisting data in workspaces and setting up Docker layer caching to speed up your Docker builds. Keep an eye out for a deep dive into these features and others in upcoming installments of this series.

If you are interested in an expert-level review of your CircleCI configuration, you can get a personalized, one-on-one evaluation from a dedicated support engineer by signing up for a premium support plan.

Copy to clipboard