If you’ve worked in a large, security-minded organization, you know how developers’ need for speed often clashes with the organization’s need for security. Often this conflict erupts into a high-stakes battle between two teams with very different priorities and perspectives.

config-policies-image4.png

Ok, it may not always be so dramatic, but the challenge of control and empowerment is very real. Call it separation of duties, trust but verify, or any other name: the hard truth is that every organization needs to protect access to critical systems. But the old days of rigorous manual review and intense personal scrutiny of individual actors can’t keep pace in today’s environment.

In this blog post, you’ll learn how pairing CircleCI’s flexible configuration-as-code (empowerment) with our new config policies (control) can give your SecOps team the tools they need to define safeguards, control access, and unlock team empowerment.

A new way to think about approvals

Config policies allows you to define in code many of the company-level policies you already have in place regarding chain-of-custody, rigorous change control, secure coding, and efficient use of IT resources. For instance:

  • Requiring code reviews & change approvals
  • Restricting access to sensitive credentials
  • Requiring compliance or security jobs (SAST, DAST, etc.)
  • Restricting features like SSH debugging
  • Controlling access to deployment targets, specific runners, or resource classes

The first two bullets are the heart of the struggle between Diana DevOps and Porter Protector: Porter needs to know that any use of production credentials has been reviewed to reduce the chance of malicious or naive behaviors. Diana’s team needs to demonstrate to Porter just how seriously they take their responsibility — they just can’t wait for manual reviews from a centralized team.

How might Porter and Diana overcome their particular challenge together, with the help of CircleCI?

The first step is to eliminate slow and error prone manual approval processes based on reputations and interpersonal relationships. Let’s look at a new policy Porter drafted with input from Diana and her team:

All production credentials must be stored in our centralized vault. Application teams may access those credentials via the `APPTEAM-prod` context assigned to your department. Only pipelines running on approved repository branches, with documented approvals, will be granted access.

Diana loves this policy because now her team only engages Porter once, if at all. She can use tooling and automation to adhere to the process from the start. And Porter’s team now knows they have automation to enforce that policy is followed – every single time!

  • Require code reviews & change approvals - Check
  • Restricting access to sensitive credentials - Check

Four steps to shift power left – safely

CircleCI believes strongly in shift-left philosophies. The earlier in a process that risks can be addressed, the greater your chances and lower the costs of mitigating that risk.

Effectively shifting left requires working backward from deployment and considering all potential checkpoints that can help mitigate risk without needing to pull the brakes on value at the last minute (or last mile).

However, giving more power, access, and trust to developers and application teams feels inherently risky, unsafe, and usually involves a lot of hard conversations.

What those debates really speak to is the battle of empowerment and control we see play out across so many organizations. If SecOps teams lose their ability to enforce rules later in the process, how do they know they were followed anywhere in the process?

In the following sections, we’ll look at four specific processes that enable us to shift left, and maintain centralized control and verification. Each approach builds on the previous, and together they unlock a much more empowered and enforceable software delivery process.

Config policies flow chart

Step 1: The protected branch paradigm

One of the more important practices starts in your commit and merge discipline. Who can push to your main/default (i.e. production) branch, and what does it take to do so?

If you’re still relying on certain individuals or privileged roles to commit directly to your production stream, then you have significant risk in your pipeline: risk of failure, risk of delays, and risk of compromise. No matter how trustworthy your team is, the assumption of trust is inherently dangerous – you need a process.

  1. Branch permissions protect your default/production/main branch.
    NOBODY pushes to that branch, not even admins.
  2. All changes to default/production/main must be through a MERGE REQUEST or PULL REQUEST.
    This tracks key data: who changed what, when, and why.
  3. Enforce your process using merge checks, automated validation that key requirements are met. Example requirements include:
    • Number of approvers > N
    • Build and tests are passing
    • Compliance / license scan is clear
    • Manager signoff
    • Security jobs in place

This is process #1 because it is a foundation of control that you can build trust and empowerment on top of.

How to get started

Merge checks and branch permissions are handled on the VCS level, with assistance from your CI/CD pipeline.

CircleCI is agnostic to many sources of change, and this control will vary by VCS provider, so I am providing links to this process with GitLab, GitHub, and Bitbucket.

Step 2: Secrets stay central

You got secrets? Me too. I also have passwords I want to keep secure. A best practice we see in our most disciplined customers is that those secrets and passwords only ever exist in one centralized vault. This means one place to update, one place to rotate, one place to secure.

So then what about all the other stuff that needs those secrets? Heck, even that lame old development environment is going to need credentials for the team to deploy.

But that doesn’t mean you start doling out secrets to any Diana DevOps that comes asking. Instead, you need to enable Diana’s team to have automated and secure access, and only where authorized.

How to get started

Some of this will still vary by your own company’s needs; we won’t dictate what vault you use, but we strongly encourage a vault is used. Hashicorp Vault is a very popular enterprise and open source choice, and the rest of this post will use their Vault as an implementation example. The same concepts apply regardless of vendor selection.

But wait a minute — If a pipeline needs to access the centralized vault, how does it prove that without already having credentials?

We call this the “key zero” problem, and the answer is a little nuanced. We do need a key zero, but not for pipelines.

First let’s understand how a traditional Systems Admin might be setting up an internal secrets vault:

  • Key zero is only used for root access in our vault
  • Root access is only used for the most foundational configuration:
    • Initial admin roles
    • External identity provider
  • Admin users
    • have pre-configured rights and roles
    • get their own short-lived token by logging in through an established identity provider
    • are often allowed to generate or rotate secrets, but not retrieve them

But of course our software pipelines don’t do a little OAuth dance without a user there to approve, so how do build jobs get authorized access to the vault?

  • Key zero is only used for root access in our vault, and foundational configuration
  • Admin roles can configure OIDC integration with known providers like CircleCI
  • OIDC providers issue and sign tokens with defined scopes the vault integration inspects
  • Vault can use pre-configured trust to:
    • Validate and authenticate provided tokens
    • Assign specific roles to the token

But what about those root keys?

That answer is outside your decisions specific to software pipelines. But essentially the “root token” is usually a collection of many keys that all must be present to be unlocked. These are distributed amongst the company’s most privileged personnel, usually company officers. No single actor, or even two colluding actors, can unseal the vault unless 3+ members are present. Physical hardware security model (HSM) devices raise the requirement to include pairing those keys with physical presence in, say, a company data center.

Step 3: Pre-established relationships between systems

As with steps 1 and 2, the theme of zero trust continues in our third bit of process. Really it’s about shifting trust from individuals or groups to trust of a well understood and enforced process.

Establishing a trusted relationship between CircleCI and your central security vault allows you to delegate access decisions to your vault’s claim or policy mechanisms. Your existing vault can inspect the action rather than just the actor: “Should a dev deploy for foo-app get access to DB credentials XYZ?“

Individual developers using CircleCI won’t need to access or even think about the secrets their workloads need. Instead, with OIDC claims, they need only to request the required role for the action being performed. It is still up to your central vault to decide if such a role is valid, and what permissions (or secrets) it gets access to.

Integrating with the popular Hashicorp Vault uses two aspects of configuration: JSON Web Token (JWT) authentication method and access control list (ACL) policies.

Configuring OIDC to connect with Vault is relatively simple to set up. To learn more, read Integrate CircleCI with HashiCorp Vault using OIDC.

JWT authentication allows Hashicorp to inspect the web-standard token provided by CircleCI. In order to validate, it needs two pieces of information: where did this originate, and where is it valid? In our case, the answer is the same for both: a unique URL we provide for every CircleCI organization:

  • OIDC discovery URL: https://oidc.circleci.com/org/{YOUR_ORG_ID}
  • Bound issuer ID: https://oidc.circleci.com/org/{YOUR_ORG_ID}

Vault will call this URL provided in advance by your admin team and ask CircleCI:

  • “Is this really from you?”
  • “Is the payload (claims) valid?”

Only if both systems agree will trust be conferred to the CircleCI job under the role configured in Vault’s policy.

Step 4: Automate the policies, empower the people

This change is where your organization’s hard work to cut through years of old policy and create alignment culminates into incredible organizational impact. Because once you have those three foundational pieces in place, you’re ready for the pinnacle of automated empowerment and control.

Introducing CircleCI config policy, a radical new offering on the CI landscape that allows CircleCI organizations to define explicit and granular rules of the road for all pipeline users. What is required, forbidden, encouraged or enforced?

Best of all? Keeping in CircleCI’s paradigm, it’s all configuration-as-code – only this time owned exclusively by admins.

How to get started

CircleCI config policies allow organizational admins to define organizational policy in an industry-standard Open Policy Agent (OPA). Policies are defined using rego. What is rego?

Rego queries are assertions on data stored in OPA. These queries can be used to define policies that enumerate instances of data that violate the expected state of the system.

For a simple example, I’ll pick on another one of my favorite CircleCI features, orbs. Orbs are encapsulated CircleCI configuration. Teams can share custom-built private orbs or use an open source orb from our developer hub to simplify their config and greatly reduce developer toil. However, security-minded orgs often want to limit or restrict the use of third-party code.

    Policy_name["ban_3rd_party"] 

# Each policy may declare 1:many rules 
# Rules must be unique across entire organization as well.
    ban_orbs = config.ban_orbs(["3rdParty/risky-orb"]) 

# Rules must be enabled to be enforces
    enable_rule["ban_orbs"] 

# Rules default to a warning, and can be set to HARD_FAIL
    hard_fail["ban_orbs"]

This particular rule says that no projects in our entire organization may use 3rdParty/unvetted-orb as an exclusion, and that might become a game of whack-a-mole. However it’s just as easy to make a policy based on allow-lists. For this one, I’ll show a customized rego that inspects the config file itself.

        policy_name["orbs_allowlist"]
        allow_approved_orbs = check_allowlist({"partner/useful","partner2/useful"})

        enable_rule["allow_approved_orbs"] 
        hard_fail["allow_approved_orbs"]

        #custom functions for the win!
        check_allowlist(allowed_orbs) = { orb: msg | orb := input["orbs"][_]
          [name, _] := split(orb, "@")
          not allowed_orbs[name]
          msg := sprintf("%s orb is not allowed in CircleCI configuration", [name])
        }

There! Now we only need to define which namespaces/names are valid, and everything else will be blocked.

But Eddie, Diana, and Porter were not fighting over orbs, they were fighting over access to sensitive production secrets!

OK – let’s look at Diana’s team’s config.

version: 2.1
jobs:
  test:
   ...
  build-push:
     steps:
       - checkout
	...

  deploy:
     steps:
       - checkout
	...
workflows:
 main:
   jobs:
     - test
     - build-push:
         context: vault-oidc-artifacts
     - deploy:
         name: Deploy Dev
         requires: [ build-push, test ]
         context: [ vault-oidc-dev ]
     - deploy:
         name: Deploy Production
         requires: [ Deploy Dev ]
         context: [ vault-oidc-prod ]
         filters:
           branches:
             only: [ main ]        

In English, this says:

  • Run tests
  • Build Artifact
    • If tests pass
    • Using secrets stored in a context named [vault-oidc-artifact]
  • Run Dev Deploy
    • If build and tests pass
    • Using context named [vault-oidc-dev]
  • Run Production Deploy
    • If dev deploy passes
    • Using context named [vault-oidc-prod]
    • ONLY on main branch

What if Diana just removes that branch filter and tries to access Prod secrets from a feature branch?

  1. Process # 1 is adhered to by filtering deploys to run only on main branch.
  2. Process #2 is adhered to by each job using an OIDC enabled context to talk to Vault.
  3. Process #3 is predefined by our CircleCI administrators.
  4. Process #4 is still an open problem: Bad actors could circumvent policy by altering pipeline configuration.

How can Porter prevent the “bad config” problem?

Enter our true hero - config policies

Solving Process #4 is solved by CircleCI’s config policies.

With config policies, Porter is able to codify company policy into all organizational pipelines. For this use case we’ll examine the following rego file. The entire policy is a single rule, limiting access to our production OIDC context to only the production branch, in this case main.

package org

import future.keywords
import data.circleci.config

policy_name["production_context_protection"]

use_prod_context_only_on_main = config.contexts_reserved_by_branches(["main"],
 {"vault-oidc-prod"}
)

# This rule will apply to all projects subscribed in globals.rego under `restricted_context_access_projects`
enable_rule["use_prod_context_only_on_main"] 
hard_fail["use_prod_context_only_on_main"]

With this policy in place, even if Diana’s team pulled out that filter or added additional branches to run on:

policy evaluation failed:
use_prod_context_only_on_main: You may not use production context: cera-boa-prod outside of main branch. Offending workflow.job: `main.Deploy Dev`   

No jobs are run. Until Diana fixes the config, she gets no secrets.

Config policy tips

Common helper functions

While config policies allow savvy admins to write very rich policy, we also made it super easy to solve common use cases with pre-built helper functions. Examples include:

  • ban_orbs
    Prevent any version of unvetted or untested orbs.
  • ban_orbs_version
    Like ban orbs, but only specific versions. Great to pair with “SOFT_FAIL” as deprecation technique.
  • resource_class_by_project
    Control larger or privileged resource classes to certain application teams.
  • contexts_alloweds_by_project_ids
    Limit what contexts are available to these projects.
  • contexts_blocked_by_project_ids
    Prevent certain projects from accessing contexts.
  • contexts_reserved_by_project_ids
    Allow only these projects to access these contexts.

Using sets and variables

Because rego is code, you can define sets to simplify as you scale up the number of policies. Instead of using project_id all over, you can define that ID once in a shared rego file and reference the same ID across multiple rules. Taking it a step further, you might group or bundle certain projects as policy would apply.

# single application IDs. Can be automated.
low_risk_project_id := "abcdef-12345"
high_risk_project_id :="defabc-54321"
Company_website_id := …
Company_billing_id := …

# sets can group projects
Front_end_applications := {company_website_id}
Tier_one_applications :=  {company_website_id, company_billing_id}
code_freeze_apps :=  {company_website_id, company_billing_id}

Now rules can filter or apply to specific groupings, including unions, deltas, or intersections. How is that useful? Let’s look at our single rule for blocking access to main. What if we have more than one production context, or more than one default branch?

use_prod_context_only_on_main = config.contexts_reserved_by_branches(
 Set_of_main_branches,
 Set_of_production_contexts
)

Or perhaps we only want to apply certain rules to our most critical projects.

# This rule will apply to all projects defined in globals.rego under `tier_one_applications`
# AS LONG AS they are not under code freeze
enable_rule["use_prod_context_only_on_main"] {
  (tier_one_applications-code_freeze_apps)[data.meta.project_id]
}

This example uses a - to only apply to a subset of applications. It can be read as “enable this rule for all tier one applications, except those currently under code freeze.” This means apps under freeze will be dropped from the reservation list of production contexts — even main/approved builds will be blocked if attempting production deploy.

A note on developer experience

Getting a failed build without a clear cause is a terrible feeling and major productivity killer. One important aspect of config policy is that any failure comes with an explicit (and customizable) reason, shared with offending developers.

Failed build error

This tells the committer to modify the build-html _and _deploy jobs of my publish-docs workflow. Dev deploys and HTML builds don’t need access to our production vault.

A note on admin experience

So Diana is pretty happy now since her team has always followed policy their pipelines are flowing like greased up sleds on Christmas vacation. How about Porter’s life chasing the not-so-compliant teams?

Config policy exposes some useful tools that have really made Porter’s team cheer!

Policy testing

To make sure new policies don’t bring the org to a halt, our new CLI features allow organizations to test policies locally using a highly expressive test framework before applying them org-wide.

Policy testing

Oops, this output tells Porter he didn’t apply all the rules he should have, and as such got HARD_FAIL when it should have passed.

Config policy audit logs and impact assessment

Developers will fix their errors, right? Maybe. But also sometimes we want to assess the landscape before applying HARD_FAILS everywhere.

To do this, Porter can start by implementing all the policies as SOFT_FAIL, meaning he can track errors without blocking builds.

Implementing soft fails allows platform operators to monitor and assess the impact of a policy without halting the flow of work.

Soft fail Project and pipeline metadata can then be used to retrieve specific project/actor information required for action (i.e. Let Diana’s team know they have 30 days to comply with the new policy).

Just the surface

In this post, we looked at four instrumental actions that any organization can take to address the challenges of developer productivity intersecting with organizational security. Enabling stricter controls on who commits to the code base, along with rule-based governance in your pipelines, will deliver demonstrable acceleration in your output and much higher team satisfaction. But the harmony unlocked between Diana’s and Porter’s teams is just the surface of what you can achieve.

There are so many more use cases to accelerate organizational directives with automated policy enforcement. We’re seeing customers tackle some awesome challenges with config policy. Below are a few of the common use cases we’re excited to support.

  • Preventing SSH re-run on deployment jobs
  • Restricting runners or resource classes to particular teams (for cost or security reasons)
  • Preventing access to teams contexts, secrets by branch and project
  • Requiring code scanning or license scans
  • Deprecating old Docker images or orb versions
  • Allowing third-party orbs but limited to an approved list
  • Ensuring only company-approved images are used

To get started, check out our docs on config policy, or reach out to your account team for a tailored demonstration unique to your organizational policies. Config policies is available to Scale plan customers.