Mozilla likes to work in the open as much as possible, which means we primarily do our development in publicly accessible code repositories, whether we expect outside collaborators or not. Those repositories, however, still need to hook into other systems, which sometimes involves managing sensitive credentials. How can we enable those connections to provide rich workflows for maintainers while also providing a great experience for outside contributors?
We are going to build an example Java project in GitHub that uses CircleCI to run tests for all pull requests (PRs), whether from a branch on the main repository or a fork. We will then add conditional logic that will build and deploy a java jar artifact to Amazon S3 when trusted committers push code to the main repository.
Creating the project
Let’s generate a small Java project using Apache Maven as the build tool:
mvn archetype:generate -DgroupId=com.mycompany.app -DartifactId=managing-secrets -DarchetypeArtifactId=maven-archetype-quickstart -Dversion=1.3 -DinteractiveMode=false
Now, we will create a simple CircleCI workflow that runs a single
version: 2.0 jobs: test: docker: - image: circleci/openjdk:8-jdk steps: - checkout - run: mvn clean test workflows: version: 2 build: jobs: - test
Once we commit this to GitHub and enable it as a project in CircleCI, each push will trigger a run of the
build workflow in CircleCI. PRs issued from any branch on the main repository will show the status of the
Enabling CircleCI for forked PRs
Now, we would like to enable this same workflow for pull requests originating from forked repositories. Not only does this allow proposed changes from contributors without commit access, but it is also helpful for committers who prefer to work from their own forks.
To enable CircleCI for forked pull requests, we go to the settings page for our project within CircleCI and choose Build Settings > Advanced Settings and enable the Build forked pull requests option.
While we are there, notice the next option, Pass secrets to builds from forked pull requests. That is disabled by default, which is exactly what we want here. In the next step, we are going to upload AWS credentials and we do not want to accidentally expose them to users outside our organization.
We make our AWS credentials available to trusted builds by setting them as project-specific environment variables. Note that it is also possible to create a context that is shared by multiple projects.
We will now move to the Build Settings > Environment Variables section of the project configuration in CircleCI and add
AWS_SECRET_ACCESS_KEY variables that contain credentials allowed to write to a chosen location in Amazon S3 where we will stage [artifacts]. These variables will not get set for CircleCI jobs triggered from a forked pull request, but only for pushed to branches on the main repository initiated by someone with commit access.
Building and deploying an artifact
At this point, we are ready to add logic to our [continuous integration (CI)] workflow to build and deploy a jar to S3. In order to keep our CI jobs running as fast as possible, we will package the jar artifact in parallel with the
test job. Once both testing and packaging complete successfully, we will deploy the artifact to S3.
We add the following job definitions to our
config.yml, using a workspace to share data between the
jobs: test: ... package: docker: - image: circleci/openjdk:8-jdk steps: - checkout - run: mvn clean package - persist_to_workspace: root: target paths: - managing-secrets-1.3.jar deploy: docker: - image: python:3.7 steps: - checkout - attach_workspace: at: target - run: pip install awscli - run: aws s3 cp target/managing-secrets-1.3.jar s3://mybucket/managing-secrets/$CIRCLE_BRANCH/managing-secrets-1.3.jar
We want to add these new jobs to our workflow and define
deploy as dependent on the
package steps. Our workflow is now expressed in the config as:
workflows: version: 2 build: jobs: - test - package - deploy: requires: - test - package
We commit those changes to master, and we have our first successful deploy! 🎉 Alice, a friend of ours at another company, is excited that the project is getting off the ground and has an enhancement she wants to propose, so she forks the project and issues a first pull request. Unfortunately, Alice’s PR shows up as failing our CI tests. The
deploy step returns:
upload failed: ... Unable to locate credentials
There is good and bad here. On the good side, CircleCI did exactly what we asked; it ran the workflow for the forked PR and did not expose any secrets. On the bad side, this is a confusing experience for Alice; she made sure her new code and tests were working correctly locally before she opened the PR, so she rightfully expects that her PR should be passing our CI tests.
We need to introduce a little more logic in order to detect forked PRs and delay the deploy until a trusted committer approves and merges the code.
Defining a command to return early on forked PRs
The CircleCI 2.1 configuration introduced reusable user-defined commands, a concept that we are going to take advantage of to make an
early_return_for_forked_prs command. Invoking it will short-circuit jobs we know are not needed for forked PRs or that we know would fail. Be sure to reference the docs on enabling config reuse.
First off, how can we tell within a job run whether this is a forked PR or not? We could check directly for the existence of specific environment variables that we passed in such as
AWS_ACCESS_KEY_ID, but we would like to achieve a more generic solution that could be copied into any project regardless of the particular set of secrets we have defined. Instead, we are going to use some of the rich context about the job provided by CircleCI’s built-in environment variables. The particular variable of interest to us is
CIRCLE_PR_NUMBER, documented as “the number of the associated [GitHub or Bitbucket] pull request. Only available on forked PRs.” If
CIRCLE_PR_NUMBER exists, then we know we are running a build for a forked PR that does not have access to secrets.
To express this condition in shell syntax, we use the
-n (non-zero length) test. The condition will look like this:
if [ -n "$CIRCLE_PR_NUMBER" ]; then # mark this job successful and stop processing fi
Most executors on CircleCI will have a local
circleci-agent command-line interface available, which provides exactly the command we need to fill out this conditional expression:
circleci-agent step halt
Now we are ready to put this all together in a new top-level
commands section of our config:
commands: early_return_for_forked_pull_requests: description: >- If this build is from a fork, stop executing the current job and return success. This is useful to avoid steps that will fail due to missing credentials. steps: - run: name: Early return if this build is from a forked PR command: | if [ -n "$CIRCLE_PR_NUMBER" ]; then echo "Nothing to do for forked PRs, so marking this step successful" circleci step halt fi
We add our custom command as the first step of the deploy job:
jobs: deploy: docker: - image: python:3.7 steps: - early_return_for_forked_pull_requests - checkout - attach_workspace: at: target - run: pip install awscli - run: aws s3 cp target/managing-secrets-1.3.jar s3://mybucket/managing-secrets/$CIRCLE_BRANCH/managing-secrets-1.3.jar
While we are at it, we can add the same command for the
package job since its only purpose is to stage an artifact for the
deploy job that we are skipping. We might as well not waste computing time to build an artifact that we never use.
If Alice rebases her PR on top of these config changes, CircleCI will now run faster and show all green due to the early returns. When her change is approved and merged, the full workflow including secrets will run for the master branch, building an approved artifact and deploying to S3.
The full code for the demo project discussed here is available on GitHub at jklukas/managing-secrets.
Learn more about the CircleCI contexts REST API.
To see this methodology applied in a real production context, see [mozilla/telemetry-batch-view] and [mozilla/telemetry-streaming], the repositories where the Mozilla data platform team defines Spark transformation jobs for creating derived datasets from Firefox telemetry data. Each push to those repositories triggers a build and delivers a jar artifact to S3; we run a transformation by spinning up an Amazon EMR cluster that points at one of the deployed jars. By default, the nightly runs reference the artifact in the
master/ path in S3, so our CircleCI configuration ensures that code merged to master during the day is what will run the next night.
https://github.com/mozilla/telemetry-batch-view/blob/33d1bf1cafd29098a989d08770358361a93d7bc3/.circleci/config.yml : https://github.com/mozilla/telemetry-streaming/blob/b3318acdfeae5e0f7d5a484bfabe809355f3adc5/.circleci/config.yml : https://circleci.com/docs/artifacts : https://circleci.com/continuous-integration/ : https://circleci.com/docs/gh-bb-integration
Jeff Klukas has a background in experimental particle physics, working both as a teacher and as a researcher helping discover the Higgs boson. He now works remotely from Columbus, Ohio on the Firefox data platform at Mozilla and was previously the technical lead for the data platform at Simple, a branchless bank in the cloud.