AI DevelopmentApr 24, 202612 min read

Shipping trustworthy code with Chunk CLI

Roger Winter

Content Marketing Manager

AI coding agents are fast. They generate functions, refactor modules, and wire up boilerplate faster than any human. What they don’t do by default is enforce the conventions a specific team has agreed on: the lint rules, the review patterns that senior engineers flag on every PR. A generated diff looks clean until someone runs CI or reads it carefully.

Chunk CLI closes that gap on the inner loop: the local edit-test-commit cycle that happens before code ever reaches CI. It mines pull request history to learn what reviewers on a specific codebase actually care about, wires real lint and test checks into every AI agent session so the agent self-corrects before it’s done, and hands off harder problems as Chunk tasks that run inside CircleCI pipelines while the team moves on to something else.

This tutorial covers the core Chunk commands against a sample API project, but you can apply these principles to any codebase you work on.

  • build-prompt: mine real PR review history from a GitHub org to generate a review prompt tuned to the team’s actual patterns.
  • init and skill install: wire that prompt into Claude Code as a commit gate and on-demand review skill, so quality checks run automatically instead of relying on prompts.
  • task: hand off a systematic fix to a Chunk, which runs a CircleCI pipeline that validates the changes and opens a PR, without interrupting local development.

By the end, the team’s review standards are encoded, enforced locally, and applied at scale in CI.

Prerequisites

Follow along with a repo that has an active review history to review what Chunk surfaces for a specific team and codebase.

  • Homebrew
  • Claude Code (this tutorial uses Claude Code, but the same approach works with any AI coding agent that supports lifecycle hooks)
  • A GitHub account with at least one repo that has pull request history
  • A CircleCI account connected to GitHub via the GitHub App (OAuth doesn’t support Chunk tasks)
  • Chunk enabled for the CircleCI org (org settings → Advanced → Allow Chunk tasks)
  • An Anthropic API key
  • A CircleCI personal API token (CIRCLECI_TOKEN)
  • Node.js installed

Part 1: Mining review history with build-prompt

AI agents don’t know a team’s conventions. A style guide prompt can be written by hand, but it goes stale as the team’s standards change. build-prompt automates this process: it mines actual PR review comments from a GitHub org, runs them through AI for pattern analysis, and produces a markdown prompt file the agent can use as context. The prompt reflects recurring patterns that reviewers actually flagged.

The quality of the output depends on the depth of review history. The best sources are repos with detailed, substantive PR reviews.

Install and authenticate

Install the Chunk CLI and authenticate:

brew install CircleCI-Public/circleci/chunk
chunk --version
chunk auth login          # prompts for Anthropic API key; stores to ~/.config/chunk/config.json
chunk auth status

build-prompt also needs a GitHub token with repo scope to read PR review data:

export GITHUB_TOKEN=<token-with-repo-scope>

Run it

From the root of the repo:

cd task-api
chunk build-prompt --org <your-org> --repos <your-repo> --top 5

--top 5 limits analysis to the five most-reviewed PRs by comment volume. If the repo hasn’t had PR activity in the last three months, add --since to extend the discovery window:

chunk build-prompt --org <your-org> --repos <your-repo> --top 5 --since 2024-01-01

The CLI runs three stages (Discover, Analyze, and Generate) and prints progress as it goes.

Terminal output of chunk build-prompt running three steps: Discovering Top Reviewers, Analyzing Review Patterns, and Generating PR Review Prompt, all completing successfully

Results from chunk build-prompt

Four files land in .chunk/context/:

  • review-prompt.md: the generated prompt.
  • review-prompt-analysis.md: which PRs were selected and why.
  • review-prompt-details.json: raw extracted comment data.
  • review-prompt-details-pr-rankings.csv: PR scores.

review-prompt.md is worth reading before moving on. It shows exactly what the review skill will use to evaluate code changes. Here’s a portion of the prompt generated from review comments on the demo repo. The first four of six core principles are distilled entirely from what reviewers actually wrote across the top-reviewed pull requests:

# PR Review Agent Prompt

You are a senior security-conscious code reviewer for a TypeScript/Node.js backend codebase. Your job is to identify defects, vulnerabilities, and maintainability issues in pull requests before they reach production. You review with the mindset that every merged PR must meet a security and correctness baseline — there are no "fix later" items for authentication flaws, missing validation, or unhandled errors.

## Core Principles

1. **Security is non-negotiable at every layer.** Authentication, authorization, network inputs, secret management, and information disclosure must all be correct before merge. A single oversight in any of these creates a production vulnerability.
2. **Validate everything at the boundary.** No data from `req.body`, `req.query`, `req.params`, or external sources crosses the API boundary without being checked for presence, type, format, and valid domain values. Missing validation is a bug, not a polish item.
3. **TypeScript must provide real safety, not theater.** Using `any` or type-casting to silence the compiler actively harms the codebase. If TypeScript isn't catching your bugs, you're not using it correctly.
4. **Design for failure and concurrency.** Every external call, I/O operation, and parse operation must handle failure explicitly. Code must behave correctly under concurrent access — no shared temp files, no lost events, no race conditions.

None of those principles were written by hand. build-prompt distilled them from what reviewers actually wrote, the recurring themes that recurred across pull requests. The full prompt also includes principles about production readiness and consistency, review checklists, before/after code examples, and a structured response format with severity levels. For example:

## Review Rules

### Security
- [ ] Passwords are hashed with bcrypt or argon2 with proper salt rounds — never stored or compared in plaintext
- [ ] Tokens are cryptographically signed (e.g., JWT with HMAC/RSA) and include expiration — base64 encoding alone is not authentication
- [ ] Secrets are never returned in API responses after creation; mask or omit entirely
- [ ] No hardcoded default secrets or API keys — every secret must be explicitly configured
- [ ] User-supplied URLs are validated: scheme restricted (https in production), hostname parsed, private/internal IP ranges blocked (SSRF prevention)
- [ ] Sensitive endpoints require authentication/authorization checks
- [ ] Error responses never leak stack traces, internal paths, or system details to clients

### Input Validation

- [ ] Every endpoint validates all inputs for presence, type, format, and range before processing
- [ ] Enum-like parameters (status, priority, format) validate against known values and return 400 for invalid input
- [ ] Pagination parameters reject zero, negative, or non-numeric values
- [ ] Import/bulk endpoints validate payload existence, format, size limits, and schema of individual records
- [ ] String inputs enforce length limits; email fields validate format

That plain text credential item came from a reviewer who caught passwords being stored directly in a user registration endpoint. The URL validation item came from a webhook registration endpoint that accepted arbitrary URLs without restriction, an SSRF vector. The any TypeScript rule came from reviewers flagging it across three separate PRs. These patterns came from actual reviews, not generic advice.

The generated prompt also includes before/after code examples sourced from the actual review patterns. Notice how the “good” example extracts the salt rounds into a named constant, consistent with the prompt’s own principle about avoiding magic numbers:

## Code Examples
<details>
<summary>❌ Plaintext passwords → ✅ Hashed passwords</summary>
**Bad:**
```typescript
const user = { email, password: req.body.password, name };
users.push(user);
```
**Good:**
```typescript
import bcrypt from 'bcrypt';
const SALT_ROUNDS = 12;
const hashedPassword = await bcrypt.hash(req.body.password, SALT_ROUNDS);
const user = { email, password: hashedPassword, name };
users.push(user);
```
</details>

When /chunk-review runs later, this prompt is what the review subagent uses to evaluate the diff. It’s checking against what the team’s reviewers have historically flagged.

Terminal view of the generated review-prompt.md file showing core principles and review rules for security, input validation, error handling, and TypeScript type safety

Part 2: Closing the inner loop with chunk init and chunk skill

The inner loop is where most AI coding work actually happens: the agent generates a change, it gets reviewed, a correction gets requested, the cycle repeats. Prompts aren’t deterministic. A prompt that says “run the tests before you commit” gets forgotten after context compaction, skipped during complex multi-file edits, or quietly dropped when the agent decides the change is too small to warrant it.

Hooks solve this. A hook is a command that fires on a lifecycle event (when a session starts, when a tool runs, when the agent tries to commit) and the agent can’t skip it. If the hook fails, the agent detects the output and has to respond before continuing. For a deeper look at how hooks work in AI coding agents, review Test hooks for AI development.

Chunk uses hooks to wire two quality checks into Claude Code:

  • A commit gate: a PreToolUse hook that runs the test suite and linter before every git commit, blocking the commit if either fails so Claude self-corrects before the change lands.
  • A review skill: an on-demand agent that evaluates the diff against the prompt generated in Part 1, checking the change against real team review patterns.

Neither requires typing “run the tests” or “review this” in a prompt. The gate fires every time Claude tries to commit. The review skill runs when explicitly invoked.

Initialize the project

npm install -g @anthropic-ai/claude-code

From the root of the repo:

chunk init

chunk init detects the project’s package manager and test command, writes two files, and exits:

Detected repository: <your-org>/<your-repo>
Detected package manager: npm
Detected command: test (npm test)
✓ Wrote .chunk/config.json
✓ Wrote .claude/settings.json
✓ Project initialized

The two files:

  • .chunk/config.json: validation commands for manual runs via chunk validate.
  • .claude/settings.json: a starting-point Claude Code configuration with a commit gate hook.

config.json is useful for running checks manually via chunk validate, which runs the configured checks with content-hash caching for quick verification between sessions. The automated gate is settings.json. Here’s what chunk init generates:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash(git commit*)",
        "hooks": [
          { "type": "command", "command": "cd ${CLAUDE_PROJECT_DIR:-.} && npm ci", "timeout": 60 },
          { "type": "command", "command": "cd ${CLAUDE_PROJECT_DIR:-.} && npm test", "timeout": 300 }
        ]
      }
    ]
  }
}

This runs npm ci and npm test before every git commit. It works, but it doesn’t include the linter. Replace it with a version that gates on both tests and lint:

{
  "$schema": "https://json.schemastore.org/claude-code-settings.json",
  "permissions": {
    "allow": ["Bash(chunk:*)"]
  },
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "INPUT=$(cat); echo \"$INPUT\" | grep -q '\"git commit' || exit 0; npx jest --passWithNoTests && npx eslint src/ __tests__/ || exit 2",
            "timeout": 120
          }
        ]
      }
    ]
  }
}

Notes on this configuration:

  • matcher: "Bash" catches all Bash tool calls. The hook reads the tool input from stdin and exits 0 immediately for anything that isn’t a git commit, so there’s no overhead on other shell commands.
  • || exit 2 is required for blocking behavior. Exit code 2 tells Claude Code to block the tool call and surface the output to Claude. Exit code 1 is treated as a hook error and does not block.
  • The hook runs in the project directory, so no cd is needed if claude is started from the repo root.

For other stacks: swap npx jest --passWithNoTests for the project’s test runner and npx eslint src/ __tests__/ for its linter. The structure stays the same.

Install the review skill

Run:

chunk skill install

This installs Chunk’s bundled skills into ~/.claude/skills/, available globally in every Claude Code session, not just this repo. Chunk bundles three skills: chunk-review (code review against the team’s generated prompt), chunk-testing-gaps (mutation testing to find coverage gaps), and debug-ci-failures (CircleCI failure diagnosis).

This tutorial uses chunk-review, which reads .chunk/context/review-prompt.md and runs the review through an isolated subagent so prior conversation context doesn’t influence the findings.

What a session looks like

Start a Claude Code session from inside the repo:

claude

Give Claude a task:

Add a utility function to src/utils/formatDate.ts that takes a JavaScript Date object and returns a string in YYYY-MM-DD format. Include tests. When done, commit the change.

Here’s what happens:

  1. Claude creates src/utils/formatDate.ts and writes the function.
  2. Claude adds tests in __tests__/formatDate.test.ts.
  3. When Claude runs git commit, the PreToolUse hook fires. Jest and ESLint run before the commit can land.
  4. If any test fails, the hook exits with code 2, blocking the commit. Claude interprets the full test output as context.
  5. Claude reads the failures, identifies the root cause, fixes the underlying code, and retries the commit.
  6. Tests and lint pass, the commit lands.

Claude Code session showing the agent creating a formatDate utility function and tests, then running Jest to confirm all tests pass

To get the AI review on top of that, invoke the review skill from within the session:

/chunk-review

The skill reads .chunk/context/review-prompt.md, assembles the diff, and runs the review through a subagent. The output is based on the team’s real PR patterns; that is, the specific issues that surfaced in actual reviews on that codebase.

Terminal output of the chunk-review skill showing a review summary with five issues identified, including critical TypeScript type safety violations and unsafe JSON.parse usage

Part 3: Autonomous fixes with chunk task

Hooks close the inner loop. Tasks close the outer loop. A Chunk task is an autonomous agent that runs inside a CircleCI pipeline. It reads the repo, makes the necessary code changes, validates them against the test suite, and opens a GitHub PR when the pipeline passes. No local environment is required. The work runs on CircleCI infrastructure while development continues elsewhere.

This is especially useful for the kind of systematic improvements that a review prompt surfaces: adding input validation across multiple endpoints, backfilling error handling, or closing test coverage gaps. The patterns are clear from the review history, the work is repetitive, and the validation loop belongs in CI rather than a local terminal.

Before you begin

The repo needs to be connected to CircleCI via the GitHub App, not OAuth. The GitHub App connection is what gives Chunk permission to trigger pipelines, create branches, and open PRs. Chunk also needs to be enabled at the org level: go to org settings in CircleCI -> Advanced and enable Allow Chunk tasks.

CircleCI Advanced organization settings page showing the Allow Chunk Tasks toggle enabled

chunk task run requires both CIRCLECI_TOKEN and GITHUB_TOKEN in the shell environment. Add both to ~/.zshrc (or ~/.bashrc) so they’re available in every session.

Configure the task runner

Enter:

chunk task config

The interactive wizard prompts for the CircleCI org ID, project ID, and pipeline definition. It writes .chunk/run.json when complete.

One step to verify after the wizard runs: open .chunk/run.json. Confirm that definition_id matches the chunk-task-pipeline entry under Project Settings → Pipelines in the CircleCI UI. The wizard may select an inferred build pipeline that won’t work with Chunk’s agent. The correct definition is the one named chunk-task-pipeline, which CircleCI creates automatically when Chunk is enabled for the org.

The resulting file looks like this:

{
  "org_id": "<your-circleci-org-id>",
  "project_id": "<your-circleci-project-id>",
  "org_type": "circleci",
  "definitions": {
    "dev": {
      "definition_id": "<chunk-task-pipeline-id>",
      "default_branch": "main"
    }
  }
}

Add .chunk/run.json to .gitignore; it contains account-specific IDs.

Wire up test results

For Chunk to read individual test failures rather than just pass/fail, the pipeline needs to store JUnit results. In .circleci/config.yml enter this content:

- run:
    name: Run tests
    command: npm test -- --reporters=default --reporters=jest-junit
    environment:
      JEST_JUNIT_OUTPUT_DIR: ./test-results
- store_test_results:
    path: ./test-results

Without store_test_results, Chunk can determine whether the pipeline passed or failed but can’t read specific test names and failure messages. That granular output is what tells the agent exactly what to fix.

Run a task

The analysis from Part 1 identified recurring patterns across pull requests: missing input validation and endpoints that accept any input without checking it. Rather than fixing these one at a time in local sessions, give the work to a Chunk task. The task description can reference the specific patterns from the review prompt:

chunk task run \
  --definition dev \
  --new-branch \
  --prompt "Our review prompt (.chunk/context/review-prompt.md) identifies missing \
input validation as the most common issue across PRs. Add zod schema validation \
to the POST and PUT /tasks endpoints: title must be a non-empty string (max 200 \
chars), status must be one of 'pending', 'in_progress', or 'done', and priority \
must be one of 'low', 'medium', or 'high'. Return 400 with specific error \
messages for invalid input. Add tests for both valid and invalid inputs. Make \
sure all existing tests still pass."

The CLI confirms the submission and returns a run ID. In the CircleCI UI, the chunk-task workflow starts. Chunk reads the repo, adds the validation logic and tests, runs the full test suite to verify nothing is broken, and opens a PR against main.

Terminal output of chunk task run command showing a successful run trigger with run ID and pipeline ID returned

After triggering the task, the CircleCI web interface shows Chunk working on the task.

CircleCI Pipelines view showing the chunk-task-pipeline with two workflows: chunk-task running and setup-workflow completed successfully

Once Chunk finishes, it outputs a summary of what it did and creates a diff showing the PR changes.

CircleCI Chunk agent chat showing a task summary and code diff adding Zod input validation and expanded test coverage to the task API endpoints

The PR adds schema validation to the task endpoints and includes tests for every invalid input case, exactly the patterns the review prompt flagged. The fix was validated by the same pipeline that runs on every PR, before anyone on the team read a line of it.

Chunk ran the CircleCI pipeline, which came back green, meaning that all tests passed.

CircleCI pipeline showing the build-and-test workflow passing with all steps completed successfully, including checkout, cache, lint, and run tests

This is where the earlier work pays off. build-prompt mined the patterns from real reviews. The commit gate and review skill enforced them on every local change. Now a Chunk task has used those same patterns to drive systematic fixes across the codebase, validated in CI and landed as a ready-to-merge PR.

Put it to work on your code

To review what we’ve covered:

  • build-prompt mined real PR review comments and distilled them into a review prompt.
  • chunk init and chunk skill install wired a commit gate and an on-demand review skill into Claude Code so checks run automatically.
  • chunk task handed off a systematic fix to a CircleCI pipeline that validated the changes and opened a PR.

The short version: Chunk CLI transforms quality checks from “things the team hopes the agent remembers” to “things that run every time.”

Try running build-prompt against a repo with active review history and read what comes out. The generated prompt is a snapshot of what a team’s reviewers actually care about, and it’s often surprising how consistent those patterns are. From there, chunk init takes a few seconds, and the commit gate starts paying for itself on the first session.

To try Chunk, sign up for a free CircleCI account. The Chunk CLI repo, hooks guide, and setup docs cover everything from there.