Building Effective Claude Code Agents: From Definition to Production#

The team behind Claude Code’s C compiler project needed 60 sessions across parallel agents to compile a working C compiler. The biggest challenge wasn’t the prompts. It was designing the environment around the agents.

Split-screen comparing a single supervised chat session with one task at a time versus four parallel autonomous agent specialists working simultaneously with shared progress tracking

Figure 1 - From Chat Sessions to Agent Teams: The fundamental shift from interactive chat (constant human supervision, one task at a time) to autonomous agents (parallel specialists, each with bounded scope and shared progress tracking). This architectural change is what makes it possible to tackle projects that span 60+ sessions and thousands of features.

You open Claude Code Monday morning. You have a collaborative document editor to build: real-time sync, CRDT conflict resolution, a React frontend, a WebSocket backend, and 47 features to implement. You could type each instruction one at a time, supervise every decision, and copy-paste your way through a week of work. Or you could define 4 specialist agents, hand them a feature list, and check in at lunch.

The second approach is not science fiction. It is what Anthropic’s research on effective agents [1] and their engineering work on long-running agent harnesses [2] have made practical. The insights are not what most people expect. The most effective agents are not the ones with the cleverest prompts or the most sophisticated reasoning chains. They are the ones with the best-designed environments: clear task structures, focused context, robust validation, and explicit progress tracking.

As the team behind Claude Code’s C compiler project discovered: “the biggest challenge in autonomous agents is designing the environment around the agent” [3].

This article is a practical guide to designing, configuring, and operating Claude Code agents that reliably ship production software, whether you are running a single agent on a focused task or orchestrating a coordinated team of specialists.

What Is a Claude Code Agent?#

A Claude Code agent is a Claude Code session configured with a specific role, toolset, and behavioral constraints to operate as an autonomous specialist [4]. Unlike a general-purpose chat session where you interactively guide the model through tasks, an agent receives a structured assignment and executes it independently: reading files, writing code, running commands, and making decisions within its defined scope.

Think of the difference like the difference between a contractor you supervise minute-by-minute versus a team member you assign a task and check in with later. The agent model requires more upfront investment in defining the role and environment, but it scales dramatically better because the agent operates without continuous human input.

The agent’s behavior is shaped by 4 layers of configuration, each serving a distinct purpose.

Four stacked configuration layers from bottom to top: CLAUDE.md project constitution, agent definition file, system prompt context, and environment layer, each building on the one below

Figure 2 - The Four Layers of Agent Configuration: An agent’s behavior emerges from the interaction of 4 configuration layers. CLAUDE.md provides project-wide conventions every agent shares. The agent definition file narrows scope to a specific role. System prompt context injects domain knowledge on demand. The environment provides the actual tools and file system the agent operates in.

The CLAUDE.md file acts as the project constitution, providing project-wide conventions and instructions that every agent reads at session start. The agent definition file specifies the role: instructions, tool restrictions, embedded hooks, and model selection. The system prompt context injects skills, task lists, and dynamic state. And the environment provides the available tools, file system, and installed dependencies.

The most effective agents are not the ones with the cleverest prompts. They are the ones with the best-designed environments. You cannot control the model’s capabilities, but you can control the clarity of task definitions, the focus of context, the robustness of validation, and the quality of progress tracking.

Anatomy of an Agent Definition File#

Every agent is defined in a markdown file with YAML frontmatter. These files live in .claude/agents/team/ for team agents, or can be referenced directly for standalone use. The frontmatter specifies configuration; the markdown body provides instructions.

Here is a real-world example: a sync engine specialist for a collaborative editor.

1
---
2
name: sync-engine
3
description: >
4
  CRDT and real-time synchronization specialist. Implements conflict-free
5
  document merging, WebSocket connection management, and operational
6
  transform logic for the collaborative editor.
7
tools: Read, Write, Edit, Bash, Glob, Grep
8
model: opus
9
hooks:
10
  PostToolUse:
11
    - matcher: "Write|Edit"
12
      hooks:
13
        - type: command
14
          command: "$CLAUDE_PROJECT_DIR/.claude/hooks/validators/crdt_consistency_check.py"
15
  Stop:
16
    - matcher: "*"
17
      hooks:
18
        - type: agent
19
          prompt: |
20
            Review the sync engine implementation. Verify:
21
            1. All CRDT operations are commutative and idempotent
22
            2. Conflict resolution handles concurrent edits correctly
23
            3. WebSocket reconnection logic includes exponential backoff
24
            Block completion if any verification fails.
25
          tools: Read, Bash
26
color: purple
27
---
28

29
# Sync Engine Specialist
30

31
You are responsible for the real-time collaboration infrastructure.
32

33
## Your Ownership
34
Files you own and can modify: src/sync/, src/crdt/, src/websocket/
35
Files you can READ but not modify: all other directories
36

37
## Workflow
38
1. Read claude-progress.txt for current project state
39
2. Check the shared task list for your next assignment
40
3. Implement one feature at a time
41
4. Run consistency tests: npm run test:sync
42
5. Commit with format: "feat(sync): description"
43
6. Update the task list and progress file

Annotated agent definition file showing six key sections: identity, tool restrictions, model selection, embedded hooks, file ownership boundaries, and workflow loop

Figure 3 - Anatomy of an Agent Definition File: Every component serves a purpose. The YAML frontmatter (identity, tools, model, hooks) defines what the agent can do. The markdown body (ownership, workflow) defines what the agent should do. Together, they create a bounded, self-validating autonomous specialist.

Let’s break down the 3 most important configuration levers.

Tool Restrictions#

The tools field controls which Claude Code tools the agent can access. This is a surprisingly powerful design lever. By restricting tools, you change the agent’s entire role, from implementer to reviewer, from writer to analyst, from builder to validator.

Matrix of four tool restriction patterns showing how different tool sets create distinct agent roles: full implementer, read-only reviewer, limited implementer, and pure analyst

Figure 4 - Tool Restrictions Define Agent Roles: The same Claude model becomes 4 fundamentally different agents depending on which tools it can access. A read-only reviewer cannot accidentally introduce bugs because it physically cannot write files. Tool restrictions create architectural boundaries that complement file ownership.

Tool Set	Role Pattern	Example
Read, Write, Edit, Bash, Glob, Grep	Full implementer	Frontend dev, backend dev
Read, Bash, Glob, Grep	Read-only reviewer	Code reviewer, test engineer
Read, Write, Bash	Limited implementer	Config writer, docs author
Read, Glob, Grep	Pure analyst	Architecture reviewer, security auditor

A documentation agent that only has Write access to docs/ cannot modify source code. A security auditor with only Read and Grep cannot accidentally fix the vulnerabilities it finds; it can only report them. These constraints are not limitations; they are architectural decisions that make the system safer and more predictable.

Tool restrictions create architectural boundaries that are more reliable than behavioral instructions. Telling an agent “do not modify files outside your scope” is a suggestion. Removing the Write tool is a guarantee. Design your agent’s capabilities through tool access, not through prose instructions alone.

Model Selection#

The model field determines which Claude model the agent uses, and it is both a quality lever and a cost lever.

Opus provides the strongest reasoning capabilities. It is critical for complex algorithmic work, architectural decisions, and multi-step debugging. Use it for team leads and specialists handling intricate logic.

Sonnet provides a strong balance of capability and speed for standard implementation tasks. Most implementation agents run on Sonnet.

Haiku provides fast, cost-effective operation for routine tasks: formatting, simple testing, boilerplate generation, and quick review iterations.

Three-column comparison of Opus, Sonnet, and Haiku models showing their strengths, ideal use cases, and relative costs for building a balanced agent team

Figure 5 - Strategic Model Selection: Not every agent needs the most powerful model. A balanced team uses Opus for leadership and complex reasoning, Sonnet for standard implementation, and Haiku for fast iteration on reviews and routine tasks. This composition optimizes both quality and cost.

Embedded Hooks#

The hooks section in the frontmatter embeds validation logic directly in the agent definition. This is per-agent quality assurance, and each agent carries its own validators.

The sync engine agent above runs a CRDT consistency checker after every file write (a PostToolUse hook on Write and Edit) and requires an agent-based review before it can finish its session (a Stop hook). The validators are domain-specific: a financial analysis agent might validate metric ranges, a CSV analyst might check data structure integrity, and a report generator might verify formatting standards.

The Initializer + Coding Agent Pattern#

Anthropic’s research on long-running agents [2] identified the single most important pattern for reliable autonomous operation: separate the initialization phase from the coding phase.

This was the pattern that made the C compiler project possible. Without it, agents spent their first 10 minutes (and thousands of tokens) just figuring out what the project was and where things stood. With it, session startup dropped from a 5-minute analysis to a 5-second orientation.

Two-phase flow diagram: initialization phase creates feature list, progress file, and init script, then hands off to a repeating coding loop of implement, verify, commit, and update

Figure 6 - The Initializer + Coding Agent Pattern: The single most important pattern for long-running agent reliability. Phase 1 creates 3 artifacts (feature list, progress file, init script) that give every subsequent session immediate orientation. Phase 2 follows a disciplined loop of implement-verify-commit-update. This separation is what enabled the C compiler project to maintain coherence across 60 sessions.

The Initializer Phase#

Before any coding begins, a dedicated initialization step creates 3 critical artifacts:

1. The Feature List: a comprehensive, granular breakdown of every feature with verification criteria. The critical design decision: "passes": false starts as the default for everything. The agent’s job is to work through this list, implementing features and flipping them to true only after verification.

1
[
2
  {
3
    "id": 1,
4
    "category": "core",
5
    "description": "User can open the app and see an empty document editor",
6
    "steps": [
7
      "Navigate to localhost:3000",
8
      "Verify the editor component renders",
9
      "Verify the toolbar is visible",
10
      "Verify the document area accepts text input"
11
    ],
12
    "passes": false,
13
    "priority": "critical",
14
    "assigned_workstream": "frontend"
15
  }
16
]

2. The Progress File: a running log that bridges context windows across sessions.

1
## Last Updated: 2025-01-15 14:30 UTC
2
## Session: 47 of estimated 60
3

4
### Completed
5
- Feature 1-12: Core editor rendering and input handling
6
- Feature 13-18: Toolbar formatting actions
7
- Feature 19-22: Document save/load API
8

9
### In Progress
10
- Feature 23: Real-time collaboration via WebSocket
11
  - Server-side: WebSocket handler implemented, needs CRDT integration
12
  - Client-side: Connection manager working, sync logic pending
13

14
### Blocked
15
- Feature 30: PDF export (waiting on document model finalization)
16

17
### Known Issues
18
- Cursor position jumps on rapid input (tracked in issue #14)

3. The Init Script: a shell script that bootstraps the development environment, runnable as a SessionStart hook.

1
#!/bin/bash
2
# init.sh — Run at the start of every coding session
3
set -e
4
npm ci                           # Install dependencies
5
npm run build                    # Verify build works
6
npm run test -- --passWithNoTests  # Verify tests pass
7
npm run dev &                    # Start dev server
8
sleep 3
9
curl -f http://localhost:3000 > /dev/null 2>&1 || exit 1
10
echo "Environment ready"

Timeline showing how the progress file bridges context windows across five sessions, with each session reading state at start and writing updates at end

Figure 7 - The Progress File Bridges Context Windows: Each agent session starts fresh, with no memory of previous sessions. The progress file is the mechanism that creates continuity. Without it, every session wastes thousands of tokens rediscovering project state. With it, orientation takes 5 seconds. This is not optional infrastructure. It is the foundation of multi-session reliability.

The progress tracking system is not optional. It is the mechanism that creates continuity across sessions. Each new agent session starts completely fresh. Without a progress file and feature list, the agent has no idea what happened in previous sessions. The initialization investment pays for itself in the first 30 seconds of every subsequent session.

The Coding Phase#

With initialization complete, coding agents follow a disciplined loop:

Read claude-progress.txt and the feature list
Pick the highest-priority incomplete feature
Implement the feature
Run tests and verify the feature works
Update feature_list.json (set passes: true)
Commit changes with a descriptive message
Update claude-progress.txt
Repeat

This loop is simple but remarkably effective. The progress file ensures continuity across sessions. The feature list ensures completeness. The commit-after-each-feature approach ensures that progress is never lost even if a session crashes or runs out of context.

Designing Agent Roles#

Effective agent design starts with clear role definition. Each agent needs a focused responsibility, a bounded scope of files it can affect, and explicit success criteria. Four patterns consistently work well in production.

The Specialist Pattern#

Each agent owns a specific domain of the codebase and has deep expertise in that domain. This is the most common pattern for implementation teams.

1
frontend-dev:    Owns UI components, styling, client-side state
2
backend-dev:     Owns API routes, business logic, database queries
3
data-engineer:   Owns database schema, migrations, data pipelines
4
devops-agent:    Owns CI/CD, Docker configs, deployment scripts

Specialists benefit from focused skills (knowledge packages that provide domain-specific guidance) and targeted hooks (validators that check domain-specific quality criteria).

The Reviewer Pattern#

A reviewer agent has read-only access to the entire codebase but cannot modify any files. Its job is to analyze, critique, and report, never to fix. This creates a clean separation between identification and resolution.

1
---
2
name: code-reviewer
3
description: Reviews code for quality, security, and style compliance.
4
tools: Read, Bash, Glob, Grep
5
model: opus
6
---
7

8
# Code Reviewer
9

10
You review code written by other agents. You CANNOT modify files.
11

12
## Review Checklist
13
1. Type safety: Are types properly defined? Any use of 'any'?
14
2. Error handling: Are errors caught and handled appropriately?
15
3. Security: Any SQL injection, XSS, or auth bypass risks?
16
4. Testing: Is the code covered by tests?
17
5. Style: Does it follow the conventions in CLAUDE.md?

The Orchestrator Pattern#

For complex multi-phase projects, an orchestrator agent manages the pipeline (sequencing phases, coordinating handoffs between specialists, and making architectural decisions) without implementing any features itself. The orchestrator’s “implementation” is coordination: reading status, making decisions, sending messages, and updating task lists.

This aligns with Anthropic’s recommendation that the best agent systems use simple, composable patterns rather than complex frameworks [1].

The Adversarial Evaluator Pattern#

One of the most powerful patterns from Anthropic’s “Building Effective Agents” guide is the evaluator-optimizer loop [1]. An evaluator agent is deliberately prompted to find weaknesses in another agent’s output. The producer agent then revises based on the critique. The loop continues until the evaluator approves.

Four-panel grid showing agent role patterns: specialist with bounded file ownership, read-only reviewer, orchestrator coordinating without implementing, and evaluator-optimizer critique loop

Figure 8 - Four Agent Role Patterns: Each pattern serves a distinct purpose. Specialists own bounded file domains and carry domain-specific validation. Reviewers analyze without modifying. Orchestrators coordinate without implementing. Evaluator-optimizer loops iterate toward quality through adversarial critique. Choose the pattern that matches the task.

This works particularly well for research writing, security auditing, and any task where quality improves through iterative criticism.

CLAUDE.md#

The CLAUDE.md file is the single most important configuration artifact for agent effectiveness. Every agent reads it at session start, making it the shared source of truth for project conventions, architecture decisions, coding standards, and operational rules.

An effective CLAUDE.md for agent-driven projects includes:

1
# Project Name — Agent Instructions
2

3
## Architecture
4
- Frontend: React 18 + TypeScript, Vite build
5
- Backend: Express.js + TypeScript
6
- Database: PostgreSQL via Prisma ORM
7
- Real-time: WebSocket with Yjs CRDT
8

9
## Coding Standards
10
- TypeScript strict mode, no `any` types
11
- All functions must have JSDoc comments
12
- Named exports only (no default exports)
13
- Error handling: use typed error classes from src/lib/errors.ts
14

15
## Git Conventions
16
- Commit after each completed feature
17
- Format: "feat(scope): description" or "fix(scope): description"
18
- Never commit with failing tests
19

20
## Agent Team Structure
21
See .claude/agents/team/ for definitions. Ownership boundaries:
22
- frontend-dev: src/components/, src/pages/, src/styles/
23
- backend-dev: src/server/, src/api/, src/database/
24
- sync-engine: src/sync/, src/crdt/, src/websocket/
25
- test-engineer: READ-ONLY reviewer
26

27
DO NOT edit files outside your ownership area.

CLAUDE.md as a central hub with six radiating sections: architecture, coding standards, git conventions, team structure, working process, and project rules, read by all agents at session start

Figure 9 - CLAUDE.md as the Project Constitution: Every agent reads CLAUDE.md at session start, making it the single source of truth for conventions, standards, and rules. The key principle is specificity: “TypeScript strict mode, no any types, named exports only” produces consistent results. “Write clean code” does not.

The key principle is specificity. Vague instructions (“write clean code”) produce vague results. Specific instructions (“TypeScript strict mode, no any types, named exports only”) produce consistent, predictable output across all agents and sessions.

What Goes Wrong: Common Pitfalls#

Every team building with agents hits the same failure modes. Recognizing them early saves significant time and token costs.

Over-scoping agent tasks. An agent assigned “build the entire authentication system” will struggle. An agent assigned “implement the /api/auth/login endpoint with JWT token generation” will succeed. Break tasks down to the level where each one is achievable in a single focused session.

Skipping the initializer phase. Jumping straight into coding without creating a feature list, progress file, and init script leads to agents that spend their first 10 minutes, and thousands of tokens, just figuring out what the project is. The initialization investment pays for itself immediately [2].

Ignoring context pollution. Long sessions accumulate irrelevant context: error messages from fixed bugs, exploration of dead-end approaches, verbose build output. This pollutes the agent’s attention and degrades quality. Use PreCompact hooks to monitor what is being lost, and structure work so agents commit and restart rather than running indefinitely.

Assuming agents remember across sessions. Each new session starts fresh. Without a progress file and feature list, the agent has no idea what happened in previous sessions. The progress tracking system is not optional. It is the mechanism that creates continuity [2].

Four-panel dashboard of common agent pitfalls: over-scoping tasks, skipping initialization, context pollution from long sessions, and missing progress tracking between sessions

Figure 10 - The Four Pitfalls Every Agent Team Hits: Over-scoping tasks, skipping initialization, accumulating context pollution, and assuming cross-session memory. Each pitfall has a concrete fix: scope to single sessions, always initialize first, commit and restart rather than run indefinitely, and treat the progress file as mandatory infrastructure.

Focused context beats large context. Explicit task structures beat open-ended prompts. Deterministic validation beats probabilistic compliance. Incremental progress tracking beats marathon sessions. Every principle points the same direction: constrain the environment to amplify the agent.

Cost Optimization#

Agent teams can be expensive. A 6-agent team running for 2 hours can consume significant API credits. Four strategies help control costs without sacrificing quality.

Right-size your models. Not every agent needs Opus. Reserve it for tasks that genuinely require complex reasoning: team leads, architecture decisions, multi-step debugging. Standard implementation work runs well on Sonnet. Reviews and formatting run on Haiku at a fraction of the cost.

Kill idle agents. An agent waiting for a dependency to resolve is consuming tokens on polling. Use observability hooks to detect idle agents and terminate them, respawning when their dependencies are met.

Optimize context. Skills with progressive disclosure load documentation only when needed, avoiding the upfront token cost of loading everything into context. Keep CLAUDE.md focused on essentials rather than exhaustive documentation.

Batch your work. Instead of running agents continuously, structure work into focused sprints: initialize, execute a batch of tasks, commit, and shut down. This avoids the context degradation that happens in extremely long sessions.

Four-quadrant cost optimization diagram: right-sizing models by task complexity, terminating idle agents, progressive context disclosure, and batching work into focused sprints

Figure 11 - Four Cost Optimization Strategies: Right-sizing models across the team, terminating idle agents, using progressive disclosure for context management, and batching work into focused sprints. Together, these strategies can reduce agent team costs significantly while maintaining, or even improving, output quality.

Conclusion#

Building effective Claude Code agents is fundamentally about environment design rather than prompt engineering. The model’s capabilities are fixed. What you can control is the environment it operates in: the clarity of task definitions, the focus of its context window, the robustness of its validation infrastructure, and the quality of its progress tracking.

The evidence from Anthropic’s research and large-scale projects like the C compiler build consistently points to the same principles:

Always use the initializer + coding agent pattern: create a feature list, progress file, and init script before any coding begins.
Define agents with specific roles: bounded file ownership, appropriate tool restrictions, and embedded hooks for domain-specific validation.
Assign models strategically: Opus for leadership and complex reasoning, Sonnet for implementation, Haiku for reviews and routine tasks.
Write a detailed, specific CLAUDE.md: it is the project constitution every agent follows. Specificity beats vagueness every time.
Structure work for continuity: commit after each feature, update progress files, and restart sessions rather than running indefinitely.

Complete agent team architecture flowing top to bottom: CLAUDE.md constitution, four specialized agents with different models and tools, shared progress infrastructure tracking features across sessions

Figure 12 - The Complete Agent Team Architecture: From the CLAUDE.md project constitution at the top, through specialized agents with appropriate models and tool restrictions, to the shared progress infrastructure that maintains coherence across sessions. This is the architecture that makes it possible for a team of AI agents to reliably ship production software.

The agents that actually work in production are not the ones with the most sophisticated prompting. They are the ones operating in well-designed environments, with clear roles, bounded scope, shared conventions, and persistent progress tracking. Design the environment right, and the agent performs.

The Series#

This is Part 2 of a 6-part series on Claude Code:

Orchestrating AI Agent Teams — The control layer architecture that makes autonomous coding reliable
Building Effective Claude Code Agents (this article) — Agent definitions, tool restrictions, and least privilege
Claude Code Skills — Progressive disclosure and reusable knowledge packages
Claude Code Hooks — PreToolUse, PostToolUse, and deterministic enforcement
Claude Code Agent Teams — Multi-agent coordination and file ownership
Claude Code Security — Defense-in-depth with agents, skills, hooks, commands, and teams

References#

[1] E. Schluntz and B. Zhang, “Building effective agents,” Anthropic Engineering Blog, Dec 2024. https://www.anthropic.com/engineering/building-effective-agents

[2] J. Young et al., “Effective harnesses for long-running agents,” Anthropic Engineering Blog, Nov 2025. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents

[3] N. Carlini, “Building a C compiler with a team of parallel Claudes,” Anthropic Engineering Blog, Feb 2025. https://www.anthropic.com/engineering/building-c-compiler

[4] Anthropic, “Extend Claude Code,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/features-overview

[5] Anthropic, “Orchestrate teams of Claude Code sessions,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/agent-teams

[6] Anthropic, “Automate workflows with hooks,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/hooks-guide

[7] Anthropic, “Skill authoring best practices,” Claude Platform Documentation, 2025. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices

[8] A. Osmani, “Claude Code Swarms,” AddyOsmani.com, Feb 2026. https://addyosmani.com/blog/claude-code-agent-teams/

[9] Anthropic, “Create plugins,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/plugins