Building Effective Claude Code Agents: From Definition to Production
The team behind Claude Code’s C compiler project needed 60 sessions across parallel agents to compile a working C compiler. The biggest challenge wasn’t the prompts. It was designing the environment around the agents.

Figure 1 - From Chat Sessions to Agent Teams: The fundamental shift from interactive chat (constant human supervision, one task at a time) to autonomous agents (parallel specialists, each with bounded scope and shared progress tracking). This architectural change is what makes it possible to tackle projects that span 60+ sessions and thousands of features.
You open Claude Code Monday morning. You have a collaborative document editor to build: real-time sync, CRDT conflict resolution, a React frontend, a WebSocket backend, and 47 features to implement. You could type each instruction one at a time, supervise every decision, and copy-paste your way through a week of work. Or you could define 4 specialist agents, hand them a feature list, and check in at lunch.
The second approach is not science fiction. It is what Anthropic’s research on effective agents [1] and their engineering work on long-running agent harnesses [2] have made practical. The insights are not what most people expect. The most effective agents are not the ones with the cleverest prompts or the most sophisticated reasoning chains. They are the ones with the best-designed environments: clear task structures, focused context, robust validation, and explicit progress tracking.
As the team behind Claude Code’s C compiler project discovered: “the biggest challenge in autonomous agents is designing the environment around the agent” [3].
This article is a practical guide to designing, configuring, and operating Claude Code agents that reliably ship production software, whether you are running a single agent on a focused task or orchestrating a coordinated team of specialists.
What Is a Claude Code Agent?
A Claude Code agent is a Claude Code session configured with a specific role, toolset, and behavioral constraints to operate as an autonomous specialist [4]. Unlike a general-purpose chat session where you interactively guide the model through tasks, an agent receives a structured assignment and executes it independently: reading files, writing code, running commands, and making decisions within its defined scope.
Think of the difference like the difference between a contractor you supervise minute-by-minute versus a team member you assign a task and check in with later. The agent model requires more upfront investment in defining the role and environment, but it scales dramatically better because the agent operates without continuous human input.
The agent’s behavior is shaped by 4 layers of configuration, each serving a distinct purpose.

Figure 2 - The Four Layers of Agent Configuration: An agent’s behavior emerges from the interaction of 4 configuration layers. CLAUDE.md provides project-wide conventions every agent shares. The agent definition file narrows scope to a specific role. System prompt context injects domain knowledge on demand. The environment provides the actual tools and file system the agent operates in.
The CLAUDE.md file acts as the project constitution, providing project-wide conventions and instructions that every agent reads at session start. The agent definition file specifies the role: instructions, tool restrictions, embedded hooks, and model selection. The system prompt context injects skills, task lists, and dynamic state. And the environment provides the available tools, file system, and installed dependencies.
The most effective agents are not the ones with the cleverest prompts. They are the ones with the best-designed environments. You cannot control the model’s capabilities, but you can control the clarity of task definitions, the focus of context, the robustness of validation, and the quality of progress tracking.
Anatomy of an Agent Definition File
Every agent is defined in a markdown file with YAML frontmatter. These files live in .claude/agents/team/ for team agents, or can be referenced directly for standalone use. The frontmatter specifies configuration; the markdown body provides instructions.
Here is a real-world example: a sync engine specialist for a collaborative editor.
---name: sync-enginedescription: > CRDT and real-time synchronization specialist. Implements conflict-free document merging, WebSocket connection management, and operational transform logic for the collaborative editor.tools: Read, Write, Edit, Bash, Glob, Grepmodel: opushooks: PostToolUse: - matcher: "Write|Edit" hooks: - type: command command: "$CLAUDE_PROJECT_DIR/.claude/hooks/validators/crdt_consistency_check.py" Stop: - matcher: "*" hooks: - type: agent prompt: | Review the sync engine implementation. Verify: 1. All CRDT operations are commutative and idempotent 2. Conflict resolution handles concurrent edits correctly 3. WebSocket reconnection logic includes exponential backoff Block completion if any verification fails. tools: Read, Bashcolor: purple---
# Sync Engine Specialist
You are responsible for the real-time collaboration infrastructure.
## Your OwnershipFiles you own and can modify: src/sync/, src/crdt/, src/websocket/Files you can READ but not modify: all other directories
## Workflow1. Read claude-progress.txt for current project state2. Check the shared task list for your next assignment3. Implement one feature at a time4. Run consistency tests: npm run test:sync5. Commit with format: "feat(sync): description"6. Update the task list and progress file
Figure 3 - Anatomy of an Agent Definition File: Every component serves a purpose. The YAML frontmatter (identity, tools, model, hooks) defines what the agent can do. The markdown body (ownership, workflow) defines what the agent should do. Together, they create a bounded, self-validating autonomous specialist.
Let’s break down the 3 most important configuration levers.
Tool Restrictions
The tools field controls which Claude Code tools the agent can access. This is a surprisingly powerful design lever. By restricting tools, you change the agent’s entire role, from implementer to reviewer, from writer to analyst, from builder to validator.

Figure 4 - Tool Restrictions Define Agent Roles: The same Claude model becomes 4 fundamentally different agents depending on which tools it can access. A read-only reviewer cannot accidentally introduce bugs because it physically cannot write files. Tool restrictions create architectural boundaries that complement file ownership.
| Tool Set | Role Pattern | Example |
|---|---|---|
| Read, Write, Edit, Bash, Glob, Grep | Full implementer | Frontend dev, backend dev |
| Read, Bash, Glob, Grep | Read-only reviewer | Code reviewer, test engineer |
| Read, Write, Bash | Limited implementer | Config writer, docs author |
| Read, Glob, Grep | Pure analyst | Architecture reviewer, security auditor |
A documentation agent that only has Write access to docs/ cannot modify source code. A security auditor with only Read and Grep cannot accidentally fix the vulnerabilities it finds; it can only report them. These constraints are not limitations; they are architectural decisions that make the system safer and more predictable.
Tool restrictions create architectural boundaries that are more reliable than behavioral instructions. Telling an agent “do not modify files outside your scope” is a suggestion. Removing the Write tool is a guarantee. Design your agent’s capabilities through tool access, not through prose instructions alone.
Model Selection
The model field determines which Claude model the agent uses, and it is both a quality lever and a cost lever.
Opus provides the strongest reasoning capabilities. It is critical for complex algorithmic work, architectural decisions, and multi-step debugging. Use it for team leads and specialists handling intricate logic.
Sonnet provides a strong balance of capability and speed for standard implementation tasks. Most implementation agents run on Sonnet.
Haiku provides fast, cost-effective operation for routine tasks: formatting, simple testing, boilerplate generation, and quick review iterations.

Figure 5 - Strategic Model Selection: Not every agent needs the most powerful model. A balanced team uses Opus for leadership and complex reasoning, Sonnet for standard implementation, and Haiku for fast iteration on reviews and routine tasks. This composition optimizes both quality and cost.
Embedded Hooks
The hooks section in the frontmatter embeds validation logic directly in the agent definition. This is per-agent quality assurance, and each agent carries its own validators.
The sync engine agent above runs a CRDT consistency checker after every file write (a PostToolUse hook on Write and Edit) and requires an agent-based review before it can finish its session (a Stop hook). The validators are domain-specific: a financial analysis agent might validate metric ranges, a CSV analyst might check data structure integrity, and a report generator might verify formatting standards.
The Initializer + Coding Agent Pattern
Anthropic’s research on long-running agents [2] identified the single most important pattern for reliable autonomous operation: separate the initialization phase from the coding phase.
This was the pattern that made the C compiler project possible. Without it, agents spent their first 10 minutes (and thousands of tokens) just figuring out what the project was and where things stood. With it, session startup dropped from a 5-minute analysis to a 5-second orientation.

Figure 6 - The Initializer + Coding Agent Pattern: The single most important pattern for long-running agent reliability. Phase 1 creates 3 artifacts (feature list, progress file, init script) that give every subsequent session immediate orientation. Phase 2 follows a disciplined loop of implement-verify-commit-update. This separation is what enabled the C compiler project to maintain coherence across 60 sessions.
The Initializer Phase
Before any coding begins, a dedicated initialization step creates 3 critical artifacts:
1. The Feature List: a comprehensive, granular breakdown of every feature with verification criteria. The critical design decision: "passes": false starts as the default for everything. The agent’s job is to work through this list, implementing features and flipping them to true only after verification.
[ { "id": 1, "category": "core", "description": "User can open the app and see an empty document editor", "steps": [ "Navigate to localhost:3000", "Verify the editor component renders", "Verify the toolbar is visible", "Verify the document area accepts text input" ], "passes": false, "priority": "critical", "assigned_workstream": "frontend" }]2. The Progress File: a running log that bridges context windows across sessions.
## Last Updated: 2025-01-15 14:30 UTC## Session: 47 of estimated 60
### Completed- Feature 1-12: Core editor rendering and input handling- Feature 13-18: Toolbar formatting actions- Feature 19-22: Document save/load API
### In Progress- Feature 23: Real-time collaboration via WebSocket - Server-side: WebSocket handler implemented, needs CRDT integration - Client-side: Connection manager working, sync logic pending
### Blocked- Feature 30: PDF export (waiting on document model finalization)
### Known Issues- Cursor position jumps on rapid input (tracked in issue #14)3. The Init Script: a shell script that bootstraps the development environment, runnable as a SessionStart hook.
#!/bin/bash# init.sh — Run at the start of every coding sessionset -enpm ci # Install dependenciesnpm run build # Verify build worksnpm run test -- --passWithNoTests # Verify tests passnpm run dev & # Start dev serversleep 3curl -f http://localhost:3000 > /dev/null 2>&1 || exit 1echo "Environment ready"![]()
Figure 7 - The Progress File Bridges Context Windows: Each agent session starts fresh, with no memory of previous sessions. The progress file is the mechanism that creates continuity. Without it, every session wastes thousands of tokens rediscovering project state. With it, orientation takes 5 seconds. This is not optional infrastructure. It is the foundation of multi-session reliability.
The progress tracking system is not optional. It is the mechanism that creates continuity across sessions. Each new agent session starts completely fresh. Without a progress file and feature list, the agent has no idea what happened in previous sessions. The initialization investment pays for itself in the first 30 seconds of every subsequent session.
The Coding Phase
With initialization complete, coding agents follow a disciplined loop:
- Read
claude-progress.txtand the feature list - Pick the highest-priority incomplete feature
- Implement the feature
- Run tests and verify the feature works
- Update
feature_list.json(setpasses: true) - Commit changes with a descriptive message
- Update
claude-progress.txt - Repeat
This loop is simple but remarkably effective. The progress file ensures continuity across sessions. The feature list ensures completeness. The commit-after-each-feature approach ensures that progress is never lost even if a session crashes or runs out of context.
Designing Agent Roles
Effective agent design starts with clear role definition. Each agent needs a focused responsibility, a bounded scope of files it can affect, and explicit success criteria. Four patterns consistently work well in production.
The Specialist Pattern
Each agent owns a specific domain of the codebase and has deep expertise in that domain. This is the most common pattern for implementation teams.
frontend-dev: Owns UI components, styling, client-side statebackend-dev: Owns API routes, business logic, database queriesdata-engineer: Owns database schema, migrations, data pipelinesdevops-agent: Owns CI/CD, Docker configs, deployment scriptsSpecialists benefit from focused skills (knowledge packages that provide domain-specific guidance) and targeted hooks (validators that check domain-specific quality criteria).
The Reviewer Pattern
A reviewer agent has read-only access to the entire codebase but cannot modify any files. Its job is to analyze, critique, and report, never to fix. This creates a clean separation between identification and resolution.
---name: code-reviewerdescription: Reviews code for quality, security, and style compliance.tools: Read, Bash, Glob, Grepmodel: opus---
# Code Reviewer
You review code written by other agents. You CANNOT modify files.
## Review Checklist1. Type safety: Are types properly defined? Any use of 'any'?2. Error handling: Are errors caught and handled appropriately?3. Security: Any SQL injection, XSS, or auth bypass risks?4. Testing: Is the code covered by tests?5. Style: Does it follow the conventions in CLAUDE.md?The Orchestrator Pattern
For complex multi-phase projects, an orchestrator agent manages the pipeline (sequencing phases, coordinating handoffs between specialists, and making architectural decisions) without implementing any features itself. The orchestrator’s “implementation” is coordination: reading status, making decisions, sending messages, and updating task lists.
This aligns with Anthropic’s recommendation that the best agent systems use simple, composable patterns rather than complex frameworks [1].
The Adversarial Evaluator Pattern
One of the most powerful patterns from Anthropic’s “Building Effective Agents” guide is the evaluator-optimizer loop [1]. An evaluator agent is deliberately prompted to find weaknesses in another agent’s output. The producer agent then revises based on the critique. The loop continues until the evaluator approves.

Figure 8 - Four Agent Role Patterns: Each pattern serves a distinct purpose. Specialists own bounded file domains and carry domain-specific validation. Reviewers analyze without modifying. Orchestrators coordinate without implementing. Evaluator-optimizer loops iterate toward quality through adversarial critique. Choose the pattern that matches the task.
This works particularly well for research writing, security auditing, and any task where quality improves through iterative criticism.
CLAUDE.md
The CLAUDE.md file is the single most important configuration artifact for agent effectiveness. Every agent reads it at session start, making it the shared source of truth for project conventions, architecture decisions, coding standards, and operational rules.
An effective CLAUDE.md for agent-driven projects includes:
# Project Name — Agent Instructions
## Architecture- Frontend: React 18 + TypeScript, Vite build- Backend: Express.js + TypeScript- Database: PostgreSQL via Prisma ORM- Real-time: WebSocket with Yjs CRDT
## Coding Standards- TypeScript strict mode, no `any` types- All functions must have JSDoc comments- Named exports only (no default exports)- Error handling: use typed error classes from src/lib/errors.ts
## Git Conventions- Commit after each completed feature- Format: "feat(scope): description" or "fix(scope): description"- Never commit with failing tests
## Agent Team StructureSee .claude/agents/team/ for definitions. Ownership boundaries:- frontend-dev: src/components/, src/pages/, src/styles/- backend-dev: src/server/, src/api/, src/database/- sync-engine: src/sync/, src/crdt/, src/websocket/- test-engineer: READ-ONLY reviewer
DO NOT edit files outside your ownership area.
Figure 9 - CLAUDE.md as the Project Constitution: Every agent reads CLAUDE.md at session start, making it the single source of truth for conventions, standards, and rules. The key principle is specificity: “TypeScript strict mode, no any types, named exports only” produces consistent results. “Write clean code” does not.
The key principle is specificity. Vague instructions (“write clean code”) produce vague results. Specific instructions (“TypeScript strict mode, no any types, named exports only”) produce consistent, predictable output across all agents and sessions.
What Goes Wrong: Common Pitfalls
Every team building with agents hits the same failure modes. Recognizing them early saves significant time and token costs.
Over-scoping agent tasks. An agent assigned “build the entire authentication system” will struggle. An agent assigned “implement the /api/auth/login endpoint with JWT token generation” will succeed. Break tasks down to the level where each one is achievable in a single focused session.
Skipping the initializer phase. Jumping straight into coding without creating a feature list, progress file, and init script leads to agents that spend their first 10 minutes, and thousands of tokens, just figuring out what the project is. The initialization investment pays for itself immediately [2].
Ignoring context pollution. Long sessions accumulate irrelevant context: error messages from fixed bugs, exploration of dead-end approaches, verbose build output. This pollutes the agent’s attention and degrades quality. Use PreCompact hooks to monitor what is being lost, and structure work so agents commit and restart rather than running indefinitely.
Assuming agents remember across sessions. Each new session starts fresh. Without a progress file and feature list, the agent has no idea what happened in previous sessions. The progress tracking system is not optional. It is the mechanism that creates continuity [2].

Figure 10 - The Four Pitfalls Every Agent Team Hits: Over-scoping tasks, skipping initialization, accumulating context pollution, and assuming cross-session memory. Each pitfall has a concrete fix: scope to single sessions, always initialize first, commit and restart rather than run indefinitely, and treat the progress file as mandatory infrastructure.
Focused context beats large context. Explicit task structures beat open-ended prompts. Deterministic validation beats probabilistic compliance. Incremental progress tracking beats marathon sessions. Every principle points the same direction: constrain the environment to amplify the agent.
Cost Optimization
Agent teams can be expensive. A 6-agent team running for 2 hours can consume significant API credits. Four strategies help control costs without sacrificing quality.
Right-size your models. Not every agent needs Opus. Reserve it for tasks that genuinely require complex reasoning: team leads, architecture decisions, multi-step debugging. Standard implementation work runs well on Sonnet. Reviews and formatting run on Haiku at a fraction of the cost.
Kill idle agents. An agent waiting for a dependency to resolve is consuming tokens on polling. Use observability hooks to detect idle agents and terminate them, respawning when their dependencies are met.
Optimize context. Skills with progressive disclosure load documentation only when needed, avoiding the upfront token cost of loading everything into context. Keep CLAUDE.md focused on essentials rather than exhaustive documentation.
Batch your work. Instead of running agents continuously, structure work into focused sprints: initialize, execute a batch of tasks, commit, and shut down. This avoids the context degradation that happens in extremely long sessions.

Figure 11 - Four Cost Optimization Strategies: Right-sizing models across the team, terminating idle agents, using progressive disclosure for context management, and batching work into focused sprints. Together, these strategies can reduce agent team costs significantly while maintaining, or even improving, output quality.
Conclusion
Building effective Claude Code agents is fundamentally about environment design rather than prompt engineering. The model’s capabilities are fixed. What you can control is the environment it operates in: the clarity of task definitions, the focus of its context window, the robustness of its validation infrastructure, and the quality of its progress tracking.
The evidence from Anthropic’s research and large-scale projects like the C compiler build consistently points to the same principles:
- Always use the initializer + coding agent pattern: create a feature list, progress file, and init script before any coding begins.
- Define agents with specific roles: bounded file ownership, appropriate tool restrictions, and embedded hooks for domain-specific validation.
- Assign models strategically: Opus for leadership and complex reasoning, Sonnet for implementation, Haiku for reviews and routine tasks.
- Write a detailed, specific CLAUDE.md: it is the project constitution every agent follows. Specificity beats vagueness every time.
- Structure work for continuity: commit after each feature, update progress files, and restart sessions rather than running indefinitely.

Figure 12 - The Complete Agent Team Architecture: From the CLAUDE.md project constitution at the top, through specialized agents with appropriate models and tool restrictions, to the shared progress infrastructure that maintains coherence across sessions. This is the architecture that makes it possible for a team of AI agents to reliably ship production software.
The agents that actually work in production are not the ones with the most sophisticated prompting. They are the ones operating in well-designed environments, with clear roles, bounded scope, shared conventions, and persistent progress tracking. Design the environment right, and the agent performs.
The Series
This is Part 2 of a 6-part series on Claude Code:
- Orchestrating AI Agent Teams — The control layer architecture that makes autonomous coding reliable
- Building Effective Claude Code Agents (this article) — Agent definitions, tool restrictions, and least privilege
- Claude Code Skills — Progressive disclosure and reusable knowledge packages
- Claude Code Hooks — PreToolUse, PostToolUse, and deterministic enforcement
- Claude Code Agent Teams — Multi-agent coordination and file ownership
- Claude Code Security — Defense-in-depth with agents, skills, hooks, commands, and teams
References
[1] E. Schluntz and B. Zhang, “Building effective agents,” Anthropic Engineering Blog, Dec 2024. https://www.anthropic.com/engineering/building-effective-agents
[2] J. Young et al., “Effective harnesses for long-running agents,” Anthropic Engineering Blog, Nov 2025. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
[3] N. Carlini, “Building a C compiler with a team of parallel Claudes,” Anthropic Engineering Blog, Feb 2025. https://www.anthropic.com/engineering/building-c-compiler
[4] Anthropic, “Extend Claude Code,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/features-overview
[5] Anthropic, “Orchestrate teams of Claude Code sessions,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/agent-teams
[6] Anthropic, “Automate workflows with hooks,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/hooks-guide
[7] Anthropic, “Skill authoring best practices,” Claude Platform Documentation, 2025. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices
[8] A. Osmani, “Claude Code Swarms,” AddyOsmani.com, Feb 2026. https://addyosmani.com/blog/claude-code-agent-teams/
[9] Anthropic, “Create plugins,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/plugins