CircleCI provides a number of different ways to move data into and out of jobs, persist data, and with workspaces, move data between jobs. Using the right feature for the right task will help speed up your builds, improve repeatability, and improve efficiency.
The benefit of faster CI runs will be clear to anybody who has ever waited for their CI test suite to go green.
Repeatability is also important. A repeatable CI process means that if you run the same process again against the same SHA from your repo, you will get the same result. When a CI process isn’t repeatable you’ll find yourself wasting time re-running jobs to get them to go green.
How data flows between CircleCI jobs
Data can flow between CircleCI jobs in different ways. Workspaces persist data between jobs in a single Workflow. Caching persists data between the same job in different Workflow builds. Artifacts persist data after a Workflow has finished. The use-case, implementation, and amount of time data will hang around varies between them.
Workspaces moves data in-between sequential jobs in a workflow.
When a workspace is declared in a job, one or more files or directories can be added. Each addition creates a new layer in the workspace filesystem. Downstream jobs can then use this workspace for its own needs or add more layers on top.
A common approach is to use the workspace to pass generated version numbers from a build job to a deploy job. They can also be used to pass along compiled binaries - but as they need to be uploaded and downloaded again in each job, this can be slower than just passing metadata.
Unlike caching, workspaces are not shared between runs as they no longer exists once a workflow is complete. There is one exception, re-running workflows. More information on this and a complete deep-dive of workspaces and be found in tomorrow’s blog post: Deep Diving into CircleCI Workspaces.
Caching in CircleCI
Caching persists data between the same job in multiple workflow runs.
Caching lets you reuse the data from expensive fetch operations from previous jobs. After the initial job run, future instances of the job will run faster by not redoing work. A prime example is package dependency managers such as Yarn, Bundler, or Pip. With dependencies restored from a cache, commands like
yarn install will only need to download new dependencies, if any, and not redownload everything on every build.
Caches are global within a project, a cache saved on one branch will be used by others so they should only be used for data that is OK to share across Branches. For more tips like this and a deeper understanding of CircleCI caching, you can read the CircleCI 2.0 Caching Doc.
Artifacts persist data after a workflow is completed and gone.
Artifacts are used for longer-term storage of the outputs of your build process. For instance if you have a Java project your build will most likely produce a
.jar file of your code. This code will be validated by your tests. If the whole build/test process passes, then the output of the process (the
.jar) can be stored as an artifact. The
.jar file is available to download from our artifacts system long after the workflow that created it has finished.
If your project needs to be packaged in some form or fashion, say an Android app where the .apk file is uploaded to Google Play, that’s a great example of an artifact. Many users take their artifacts and upload them to a company-wide storage location such as Amazon S3 or Artifactory. More tips and up-to-date information can be found in the CircleCI Artifacts doc.
Looking for more insights into workspaces and how best to use them? Read our follow-up blog post, Deep diving into CircleCI workspaces.
For an overview of all the things workflows can do, including OSS configs, see the Wide World of Workflows series: