CircleCI 2.0 provides a number of different ways to move data into and out of jobs, persist data, and with the introduction of Workspaces, move data between jobs. Using the right feature for the right task will help speed up your builds, improve repeatability, and improve efficiency.
The benefit of faster CI runs will be clear to anybody who’s waited for the CI to go green.
Repeatability is also important. A repeatable CI process means that if you run the same process again against the same SHA from your repo, you will get the same result. When a CI process isn’t repeatable you’ll find yourself wasting time re-running jobs to get them to go green.
How Data Flows Between CircleCI Jobs
Data can flow between CircleCI Jobs in different ways. Workspaces persist data between jobs in a single Workflow. Caching persists data between the same job in different Workflow builds. Artifacts persist data after a Workflow has finished. The use-case, implementation, and amount of time data will hang around varies between them.
Workspaces moves data in-between sequential jobs in a Workflow.
When a Workspace is declared in a job, one or more files or directories can be added. Each addition creates a new layer in the Workspace filesystem. Downstreams jobs can then use this Workspace for its own needs or add more layers on top.
Unlike caching, Workspaces are not shared between runs as they no longer exists once a Workflow is complete. There is one exception, re-running Workflows. More information on this and a complete deep-dive of Workspaces and be found in tomorrow’s blog post: Deep Diving into CircleCI Workspaces.
Caching persist data between the same job in multiple Workflow runs.
Caching lets you reuse the data from expensive fetch operations from previous jobs. After the initial job run, future instances of the job will run faster by not redoing work. A prime example is package dependency managers such as Yarn, Bundler, or Pip. With dependencies restored from a cache, commands like
yarn install will only need to download new dependencies, if any, and not redownload everything on every build.
Caches are global within a project, a cache saved on one branch will be used by others so they should only be used for data that is OK to share across Branches. For more tips like this and a deeper understanding of CircleCI caching, you can read the CircleCI 2.0 Caching Doc.
Artifacts persist data after a Workflow is completed and gone.
Artifacts are used for longer-term storage of the outputs of your build process. For instance if you have a Java project your build will most likely produce a
.jar file of your code. This code will be validated by your tests. If the whole build/test process passes, then the output of the process (the
.jar) can be stored as an artifact. The
.jar file is available to download from our artifacts system long after the workflow that created it has finished.
If your project needs to be packaged in some form or fashion, say an Android app where the .apk file is uploaded to Google Play, that’s a great example of an artifact. Many users take their artifacts and upload them to a company-wide storage location such as Amazon S3 or Artifactory. More tips and up-to-date information can be found in the CircleCI Artifacts Doc.
These CircleCI features are fairly flexible and you don’t have to implement them the way they’re explained here. These are our suggestions for common use cases.
Looking for more insights into workspaces and how best to use them? Read our follow-up blog post, Deep Diving into CircleCI Workspaces.