DevOps 101Sep 24, 202111 min read

6 optimization tips for your CI configuration

Solutions Engineer

The customer engineering team at CircleCI helps users optimize how their configuration files are set up. Every day, they can find the most useful features for your projects, balancing both your time and your credit consumption. After performing dozens of config reviews for 20+ of our enterprise-level customers, our customer engineering team has it all down to a science. Because you and your team may not always have time to work with an expert, we have put together a config optimization guide to use on your own. This tutorial is loaded with the best tips and most valuable recommendations from CircleCI’s customer engineers. We are here when you need an expert, but we also want to make it easier for teams like yours to optimize your own config.

From config disaster to config build-faster

In this post, we will cover the six most effective ways to optimize your config file so you can build faster. You will learn best practices for picking the right executor, parallelizing jobs, caching, using workspaces, secrets management, and using orbs in your config.

Selecting the right executor

Many CI pipelines would benefit from including one of our fleet of lightning-fast Docker convenience images. Running within a Docker container using the docker yaml key will provide the basics at some of the fastest speeds.

We publish these to the Docker Hub cimg profile. If your application needs other tools, consider running a custom Docker image. Here is an example using one of the older, bulkier images pinned to a certain version of Node. This executor is defined under each job by specifying a Docker image, such as in this test job:

test:
    docker:
      - image: circleci/node:9.9.

With the next-gen CircleCI Node image, you can shed layers and get a faster build. Updating to the next-gen executor is as simple as updating the image name.

The current config builds and tests in Node 9.9.0, but we would like it to be built using the latest version of Node. To do this, we replace the image used for the execution container with one of our next-gen images, like this:

docker:
  - image: cimg/node:latest

If you are interested in testing across multiple environments, we also have the ability to set matrix jobs via the Node orb. This allows you to specify different versions of Node to test as well on top of the base Node Docker layers.

Best practices for parallelism

Configuring your job to run across multiple containers in parallel speeds up your build. For example, if you have a long running test suite with hundreds of independent tests, consider spreading these tests across executors to run simultaneously. A truly optimized config means using parallelism wisely. You should carefully consider how many parallel executors you have running and whether the time-savings of splitting up tasks is worth the spin-up time of multiple containers. Also make sure that you are splitting your tests correctly across these executors.
Consider the next example. The primary activity in this test job is running the tests with the npm run test command:

test:
 …
 parallelism: 10
 steps:
    ...
     - run: CI=true npm run test

While using parallelism is a step in the right direction, this is not written optimally. This command will simply run the same exact tests on all 10 containers. To allocate the tests across multiple containers, this config needs to use the circleci tests split command from the CircleCI CLI. Tests can be allocated automatically across these containers when split by filename, classname, or timing data. Splitting by timing data is ideal for parallelization, as it spreads the tests to run evenly across the containers, so there are no faster-running containers waiting idly for a long-running test container.

Finally, consider whether this is the correct level of parallelism for this test suite. If spinning up the environment takes about 30 seconds, but testing only takes 30 seconds in each container, it may be worth considering lowering the parallelism so less time is spent setting up across all of the job runs. There is no golden ratio of test runtime to spin-up time, but it should be considered for an optimal build. Here is what an optimized config looks like when optimized to split tests by filename and timings and to run more tests in a given container:

test:
 …
 parallelism: 5
 steps:
    ...
     - run: |
            TESTFILES=$(circleci tests glob "test/**/*.test.js" | circleci tests split --split-by=timings)
            CI=true npm run test $TESTFILES

Best practices for caching

Speed up your builds with caching. This allows you to reuse data from time-consuming fetch operations. The example below uses caching to restore its npm dependencies from previous job runs. Because the npm dependencies are cached, the npm install step will need to download only new dependencies described in the package.json file. This dependency caching, which is often used with package dependency managers like npm, Yarn, Bundler, or pip, relies on two special steps specified as restore_cache and save_cache. This example shows how these cache steps are used in the test job:

test:
  ...
  steps:
    …
    - restore_cache:
        keys:
          - v1-deps-{{ checksum "package-lock.json" }}
    - run: npm install
    - save_cache:
        key: v1-deps-{{ checksum "package-lock.json" }}
        paths:
          - node_modules

Notice that both the restore_cache and save_cache steps use keys. A key is a unique identifier for locating your cache. The save_cache step specifies which directories to cache under this key. In this case, we are saving the node_modules directory so that these Node dependencies can be used in later jobs. The restore_cache step uses this key to find the cache to restore to the job. The key is a string with a version identifying the cache and an interpolated hash of the dependency manifest file written as checksum “package-lock.json”.

While this is a standard pattern of restoring and saving a cache, you can optimize it further using fallback keys. Fallback keys allow you to identify a set of possible caches to increase the likelihood of a cache hit. For example, if a single package is added to this application’s package.json, the string generated by checksum will change, and the entire cache will be missed. However, adding a fallback key with a broader set of possible key matches can identify other usable caches. Here is an example of what this cache restoration would look like with an added fallback key:

test:
  ...
  steps:
    …
    - restore_cache:
        keys:
          - v1-deps-{{ checksum "package-lock.json" }}
          - v1-deps-

Notice that we just added another element to the list of keys. Let’s go back to the scenario where a single package changed in our package.json. In this case, the first key would result in a cache miss. However, the second key allows for the cache that was previously saved within the old package.json file to be restored. The dependency installation step, npm install, would then only have to fetch the changed packages, instead of using unnecessary and expensive fetch operations for all of the packages. Visit our docs to read more about fallback keys and partial cache restoration.

Selectively persisting to workspaces

Downstream jobs may need access to data that was generated in a previous job. Workspaces allow you to store files for the entire life of the workflow. The next example config illustrates this concept. The build job builds a Node application. The next job in the workflow deploys the application. This config persists the entire working directory to the workspace in build, then attaches the directory in deploy, so deploy has access to the built application.

  build:
    ...
    steps:
      ...
      - run: npm run build
      - persist_to_workspace:
          root: .
          paths:
            - '*'
  deploy:
    ...
    steps:
       ....
        - attach_workspace:
            at: .

The application directory created in the build job is accessible to the deploy job. It works, but it is not ideal. Workspaces essentially just create tarballs and store them in a blob store, and attaching a workspace requires downloading and unpacking these tarballs. This can be a time-consuming process. It would be faster to selectively persist the files that your later jobs need. The npm run build step in this example produces a build directory that can be compressed, then stored in the workspace for deployment. Here is an optimized version of this config:

  build:
    ...
    steps:
      ...
      - run: npm run build
      - run: mkdir tmp && zip -r tmp/build.zip build
      - persist_to_workspace:
          root: .
          paths:
            - 'tmp'

  deploy:
    ...
    steps:
       ....
        - attach_workspace:
            at: .

The tmp directory with the build artifact will now be mounted to the working directory of the project. Instead of uploading and downloading the entire working directory, this config selectively stores the compressed, built application to save on time spent archiving, uploading, and downloading the workspace. The compressed file can be stored in a temporary directory in the workspace. Any downstream jobs with the workspace attached to them will now have access to this zip file. You can learn more in this deep dive into CircleCI workspaces.

Best practices for secrets management

You do not want to check your secrets into version control, and secrets should never be written in plain text in your config. CircleCI provides you with access to contexts. These allow you to secure and share environment variables across projects in your organization. Contexts are essentially a secret store where you can set environment variables as name/value pairs that are injected at runtime. To understand this better, review the next code example, an unsecure config. This config includes a deploy job that is defined with AWS secrets written in plain-text.

deploy:
 …
 steps:
    ...
     - run: 
            name: Configure AWS Access Key ID
            command: |
              aws configure set aws_access_key_id K4GMW195WJKGCWVLGPZG --profile default
        - run: 
            name: Configure AWS Secret Access Key
            command: |
              aws configure set aws_secret_access_key ka1rt3Rff8beXPTEmvVF4j4DZX3gbi6Y521W1oAt --profile default

Note: These are example credentials used for demonstration purposes only.

This text is visible to all of the developers with access to your project on CircleCI. Instead, these secrets should be stored as environment variables in a context. Add the secret key and access ID to a context titled aws_secrets as key/value pairs, which can be accessed as environment variables. This context can then be applied to the job in the workflow. The secure version of this config would look like this:

deploy:
 …
 steps:
    ...
     - run: 
            name: Configure AWS Access Key ID
            command: |
              aws configure set aws_access_key_id ${AWS_ACCESS_KEY_ID} --profile default
        - run: 
            name: Configure AWS Secret Access Key
            command: |
              aws configure set aws_secret_access_key ${AWS_SECRET_ACCESS_KEY} --profile default

workflows:
  test-build-deploy:
     ...
      - deploy:
          context: aws_secrets
          requires:
            - build

Notice that the secrets have gone from plain-text to an environment variable and the context is applied to the job in the workflow. For additional security, we employ secret masking to prevent users from accidentally printing the value of the secret.

Orbs and reusable config elements

So you have chosen the right executor for your build, you are splitting your tests appropriately, and you are persisting to the workspace to avoid duplicating your work. Now you have to do that for all of your other projects. What a pain, am I right? If only there was a way to reuse shared elements of your config file between multiple builds. Well, good news!

Circle CI provides a feature known as orbs that allow you to define configuration elements in a central location and reuse them in multiple projects quickly and easily. Not only that, you can actually pass parameters into orbs. You can craft a single orb that does multiple different things in different projects depending on the parameters you pass into it.

Using our 2.1 config version, you can also define reusable elements of your configuration to re-use in multiple jobs in the same pipeline, from simple job steps to reusing entire executors. You can also pass parameters into these reusable elements. This is useful for when you need to reuse multiple elements of a config file across multiple different parts of your pipeline.

What do orbs look like in practice? Well, here is an example of a deployment to an S3 bucket, written entirely in the config file without the use of our AWS S3 deployment orb:

- deploy:
    name: S3 Sync
    command: |+
      aws s3 sync \
        build s3://my-s3-bucket-name/my-application --delete \
        --acl public-read \
      --cache-control "max-age=86400"

That will get the job done. But here is what it looks like with the use of the S3 orb:

- aws-s3/sync:
     from: bucket
     to: ‘s3://my-s3-bucket-name/my-application’
     arguments: |
       --acl public-read \
       --cache-control "max-age=86400"

You do not need to declare a separate deploy stage for your S3 deployment. You can simply invoke the S3 sync from the orb as a step in your config file. Note that a lot of the same information is still included, but it is now represented as parameters that are passed into the orb instead of as a script in the configuration file. Not only is this more compact, but it also makes it very easy to make changes to your S3 deployment by adding, removing, or changing arguments as needed. This version is a bit easier to grasp at a glance and you can scale across multiple projects by updating just the orb. The most important takeaway from all the tips mentioned here is that D.R.Y (Don’t Repeat Yourself) isn’t just an esthetic thing. With orbs, the ability to replicate across projects is golden. Happy orb-optimizing!

Configuration review

This article has provided 6 ways to optimize your config yourself. If you want help, you should know that we offer a configuration review premium service on our Gold and Platinum premium support plans. With the configuration review service, a CircleCI DevOps customer engineer will work with your team to build a new config or review an existing one. We will review your needs and provide recommendations to get the most out of CircleCI’s features. To learn more about premium services and support plans, please contact us at cs@circleci.com.

This post would not have been possible without the combined efforts of the entire customer engineering team: Anna Calinawan, Johanna Griffin, and Grant MacGillivray.