Orchestrating AI Agent Teams: How Skills, Hooks, and Context Flow Make Autonomous Coding Reliable
An orchestrator breaks a task into pieces. Specialized agents pick up work items, each carrying skills that define what they know and hooks that enforce how they behave. Context flows from session start to task completion through a deterministic pipeline. Here is how the pieces fit together.

Cover - The Orchestrator Pattern: A central orchestrator coordinates specialized agents, each equipped with domain-specific skills and hooks. Context flows bidirectionally: the orchestrator assigns tasks with context, agents execute with skill-guided knowledge and hook-enforced quality, and results flow back through the same deterministic pipeline.

Figure 1 - The Five Primitives: Claude Code provides the runtime. Skills inject domain knowledge. Slash commands make skills invocable. Hooks enforce behavior deterministically. The orchestrator coordinates who does what. Each maps to a familiar CS concept: a dev terminal, lazy-loaded docs, CLI binaries, middleware, and a project manager.
Starting From First Principles
Before diving into how agent teams work, you need to understand what the individual building blocks are. If you have worked with web servers, databases, or CI/CD pipelines, you already know the underlying concepts. Agent systems map cleanly onto patterns you have seen before.
What Is Claude Code?
Claude Code is an AI coding agent that runs in your terminal. It reads files, writes code, runs shell commands, and searches codebases. Every action the agent takes is called a tool call: writing a file is a tool call to Write, running a test suite is a tool call to Bash, reading a file is a tool call to Read.
Think of it like a developer who works entirely through a command line. The developer has a set of tools (editor, terminal, file browser) and uses them in sequence to accomplish tasks. Claude Code works the same way but with an LLM making the decisions about which tool to use next.
The critical limitation: the LLM has a context window, a fixed amount of working memory. When the context window fills up, older information gets compressed or lost. This is why purely prompt-based instructions are unreliable for long sessions. The agent might forget instructions it read 30 minutes ago.
What Are Skills?
A skill is a markdown file (SKILL.md) that gives an agent specialized knowledge for a specific type of task. Think of it as a runbook or playbook that gets loaded into the agent’s context when relevant.

Figure 2 - Skills as Agent Knowledge: A skill is a markdown file with a name, description, and instructions. The description loads at session start so the agent knows the skill exists. The full instructions load only when the agent needs them. Supporting files (reference docs, scripts, examples) load on demand, keeping the context window efficient.
A skill has two parts:
- YAML frontmatter: metadata including the skill’s
nameanddescription. The description loads at session start so the agent knows the skill exists. - Markdown body: the actual instructions, patterns, templates, or workflows. This loads only when the skill is invoked.
---name: api-conventionsdescription: API design patterns for this codebase. Use when writing or reviewing API endpoints.---
When writing API endpoints:- Use RESTful naming conventions- Return consistent error formats with status codes- Include request validation on all inputs- Log all requests with correlation IDsThe CS analogy: Skills are like library documentation. You do not read the entire docs for every import. You read the relevant section when you need it. Skills work the same way: descriptions are always visible (like a library index), full content loads on demand (like opening a specific page).
Skills can also bundle supporting files (reference docs, scripts, examples) in a directory structure. The main SKILL.md acts as a table of contents, pointing the agent to detailed resources only when needed. This is called progressive disclosure and it keeps the context window efficient.
What Are Slash Commands?
A slash command is how you invoke a skill directly. Every skill with a name automatically becomes a /name command. Type /deploy in the terminal and the deploy skill activates. Type /fix-issue 123 and the fix-issue skill activates with 123 as its argument.

Figure 3 - Slash Commands: Bridging Users, Agents, and Skills: Slash commands turn skills into invocable actions. The user types /fix-issue 123, the skill activates with the argument, and instructions flow into the agent. Skills can run inline (in the main context) or fork into an isolated subagent. Invocation control determines who can trigger a skill: only the user, only the agent, or both.
Slash commands are more than just a shortcut. They are the primary mechanism for creating repeatable, parameterized workflows that both humans and agents can trigger. Here is what makes them powerful:
Arguments with $ARGUMENTS: Skills accept arguments that get substituted into the instructions. The skill body references $ARGUMENTS (all arguments) or $0, $1, $2 for positional access.
---name: fix-issuedescription: Fix a GitHub issue by numberdisable-model-invocation: true---
Fix GitHub issue $ARGUMENTS following our coding standards.1. Read the issue description2. Implement the fix3. Write tests4. Create a commitRunning /fix-issue 123 replaces $ARGUMENTS with 123. The agent receives concrete, actionable instructions with the specific issue number baked in.
**Dynamic context with !command**: Skills can inject live data from shell commands before the agent sees anything. The `!`command syntax runs the command and replaces the placeholder with its output.
---name: pr-summarydescription: Summarize changes in a pull requestcontext: forkagent: Explore---
## Pull request context- PR diff: !`gh pr diff`- PR comments: !`gh pr view --comments`- Changed files: !`gh pr diff --name-only`
Summarize this pull request.When /pr-summary runs, the gh commands execute first. The agent receives the actual diff, comments, and file list, not the commands themselves. This is preprocessing, not agent execution.
Forking into subagents with context: fork: A skill can run in the main conversation context (default) or fork into an isolated subagent. Forking is useful for tasks that produce verbose output (like running a full test suite) or tasks that should not pollute the main context window. The agent field specifies which subagent type to use.
Invocation control: Two frontmatter fields control who can trigger a skill:
disable-model-invocation: true: Only the user can invoke it. Use for workflows with side effects like/deployor/send-slack-message. You do not want the agent deciding to deploy because the code looks ready.user-invocable: false: Only the agent can invoke it. Use for background knowledge that is not actionable as a command.
How the orchestrator uses slash commands: In an agent team, the orchestrator can invoke skills programmatically via the Skill tool. When the orchestrator encounters a task that matches a skill’s description, it triggers the skill to inject that domain knowledge into its workflow. Specialized agents can also have skills preloaded at spawn, making slash commands available from the agent’s first action. This means the entire team can leverage a shared library of repeatable workflows: the orchestrator triggers /review-pr to spawn a review, the builder triggers /run-tests after implementation, and the validator invokes /check-coverage during its inspection.
The CS analogy: Slash commands are like CLI commands in a Unix system. ls, grep, and make are standard tools available to every user. Slash commands are standard workflows available to every agent. Just as you can pipe commands together (git diff | grep TODO), agents can chain slash commands in sequence. And just as Unix permissions control who can run sudo, invocation control determines who can trigger each slash command.
What Are Hooks?
A hook is a shell command that fires automatically at a specific point in the agent’s lifecycle. Hooks are not prompts. They are not suggestions. They are system-level interceptors that execute outside the LLM’s reasoning chain.

Figure 4 - Prompts Suggest, Hooks Guarantee: A prompt in CLAUDE.md achieves roughly 80% compliance. The agent usually follows it but can skip under context pressure. A PostToolUse hook achieves 100% compliance. It runs at the system level, every time, regardless of what the agent is thinking.
There are two critical hook events for understanding context flow:
- PreToolUse: fires before every tool call. Can allow, deny, or escalate the action. This is your safety gate.
- PostToolUse: fires after every tool call. Can inject additionalContext back into the agent. This is your quality gate.
The CS analogy: Hooks are middleware. In an Express.js web server, middleware intercepts HTTP requests before they reach your route handler. Hooks intercept the agent’s tool calls before and after they execute. PreToolUse is like authentication middleware (block unauthorized requests). PostToolUse is like response middleware (transform or validate output).
What Is an Orchestrator?
An orchestrator is a lead agent that coordinates work across a team of specialized agents. It does not write code itself. It breaks tasks into work items, assigns them to specialists, monitors progress, and synthesizes results.

Figure 5 - The Orchestrator Pattern: The orchestrator coordinates without implementing. It maintains a shared task list, assigns work to specialized agents, and synthesizes results. Each agent operates in its own context window with its own tools, skills, and hooks. The task list is the shared coordination mechanism.
The CS analogy: The orchestrator is like a project manager running a sprint. The project manager does not write code. They break stories into tasks, assign them to developers, review progress, and handle blockers. The shared task list is the sprint board. Each developer (agent) picks up tasks and reports back.
How Context Flows Through the System
Now that you know the building blocks, here is how context actually flows through an orchestrated agent team. This is the core mechanism that makes the system work.
Phase 1: Session Start
When a session begins, the SessionStart hook fires. This is your opportunity to inject the project’s current state into every agent’s awareness before any work begins.

Figure 6 - Session Start Context Loading: The SessionStart hook gathers project state (git info, context files, issues, environment) and injects it as additionalContext. The agent starts with full project awareness instead of spending tokens discovering its environment.
A typical session_start.py hook:
- Reads the current git branch and uncommitted change count
- Loads project context files (TODO.md, CONTEXT.md)
- Fetches recent GitHub issues via the
ghCLI - Packages everything into
additionalContextand returns it as JSON
# From session_start.py -- context loadingcontext_parts = []context_parts.append(f"Git branch: {branch}")context_parts.append(f"Uncommitted changes: {changes} files")
for file_path in [".claude/CONTEXT.md", "TODO.md"]: if Path(file_path).exists(): content = open(file_path).read().strip() context_parts.append(f"--- Content from {file_path} ---") context_parts.append(content[:1000])
output = { "hookSpecificOutput": { "hookEventName": "SessionStart", "additionalContext": "\n".join(context_parts) }}print(json.dumps(output))The Setup hook goes further. It runs when the agent first enters a repository and can persist environment variables across sessions using CLAUDE_ENV_FILE. It distinguishes between init mode (first entry, install dependencies, set up environment) and maintenance mode (periodic checks, disk usage, git status).
The CS analogy: SessionStart is like the constructor of a class. Before any methods run, the constructor initializes state. The hook initializes the agent’s awareness of the project before any tool calls happen.
Phase 2: Skill Loading
Once the session is running, skills determine what the agent knows. Skills can be loaded two ways:
- On demand: The agent encounters a task that matches a skill’s description and loads it.
- Preloaded: The skill content is injected into the agent’s context at startup (used for subagents).

Figure 7 - Two Skill Loading Modes: In the main session, only skill descriptions are in context. Full content loads on demand when relevant. For subagents, skills are preloaded at startup, injecting the full content into the agent’s context immediately. This gives specialized agents complete domain knowledge from their first action.
When skills are preloaded into a subagent, the agent definition specifies which skills to load:
---name: api-developerdescription: Implement API endpoints following team conventionsskills: - api-conventions - error-handling-patterns---
Implement API endpoints. Follow the conventions and patternsfrom the preloaded skills.The CS analogy: On-demand skill loading is like lazy evaluation in functional programming. You do not compute a value until you need it. Preloaded skills are like eager evaluation. You compute everything upfront because you know the subagent will need it.
Phase 3: The Tool Call Loop
Every action the agent takes passes through two hook gates: PreToolUse (before) and PostToolUse (after). This loop runs hundreds of times per session. It is the heartbeat of the system.

Figure 8 - The Tool Call Loop: Every tool call passes through PreToolUse (safety gate) and PostToolUse (quality gate). The PostToolUse hook runs validators and injects errors back as additionalContext. The agent reads the feedback and self-corrects in its next action. This loop runs hundreds of times per session with zero human intervention.
Here is the critical mechanism: additionalContext.
When a PostToolUse hook finishes running, it can return a JSON object with an additionalContext field. Claude Code takes that string and injects it directly into the agent’s conversation context. The agent sees this feedback as if someone told it something important, and it acts on it.
# PostToolUse hook: Run TypeScript compiler after file writesresult = subprocess.run( ["npx", "tsc", "--noEmit", "--pretty"], capture_output=True, text=True, timeout=30)
if result.returncode != 0: error_lines = [l for l in result.stdout.split("\n") if ": error TS" in l] output = { "additionalContext": ( f"TypeScript errors detected: {len(error_lines)} error(s). " f"Fix these before continuing to the next task." ) } print(json.dumps(output))The CS analogy: additionalContext is like a CI/CD pipeline posting comments on a pull request. You push code, the pipeline runs checks, and posts a comment with specific errors. You read the comment and fix the issues. The hook is that pipeline, and additionalContext is that comment, except it happens inside the agent’s mind, instantly, on every single file write.
Phase 4: Per-Agent Hooks
In a team of agents, a global hook that validates CSV structure on every file write would be wasteful for the API developer. Conversely, the API developer needs OpenAPI spec validation that would be irrelevant for the CSV analyst. Per-agent hooks solve this.

Figure 9 - Per-Agent Specialization: Global hooks waste computation by running irrelevant validators on every agent. Per-agent hooks embedded in agent definitions apply only the validation each specialist needs. The builder runs linters after every write. The validator enforces read-only access. Each agent’s hooks are part of its identity.
Hooks are embedded directly in the agent definition file:
---name: builderdescription: Focused engineering agent that executes ONE task at a time.model: opuscolor: cyanhooks: PostToolUse: - matcher: "Write|Edit" hooks: - type: command command: "uv run .claude/hooks/validators/ruff_validator.py" - type: command command: "uv run .claude/hooks/validators/ty_validator.py"---
You are a focused engineering agent. You build, implement, and create.You do not plan or coordinate. You execute.The builder agent carries PostToolUse hooks that run ruff (Python linter) and ty (type checker) after every Write or Edit. The validator agent uses disallowedTools: Write, Edit to enforce read-only access deterministically, not by instruction but by capability restriction.
The CS analogy: Per-agent hooks are like microservice-specific middleware. Each microservice in your system has its own middleware stack. The payment service has fraud detection middleware. The auth service has rate limiting middleware. You do not run fraud detection on the auth service. Same principle.
Phase 5: The Complete Context Flow
Now let us trace context through an entire task, from the user’s request to task completion.

Figure 10 - The Complete Context Journey: Context flows from user request through orchestrator planning, agent spawning (with skill injection and session context), the hook-enforced tool call loop (with additionalContext feedback), task completion, validation review, and finally synthesis back to the user. Every stage adds or validates context deterministically.
Here is the flow:
- User request arrives: “Add error handling to the API endpoints.”
- Orchestrator breaks it into tasks on the shared task list.
- Builder agent spawns with preloaded skills (
api-conventions,error-handling). SessionStart hook injects git state and project context. - Builder works. Every file write triggers PostToolUse hooks (ruff, ty). Errors flow back as additionalContext. Builder self-corrects.
- Builder completes. Marks task done with a summary of what changed.
- Validator agent spawns with read-only tools. Reviews the changed files. Runs tests. Reports pass/fail.
- Orchestrator synthesizes results across all tasks. Reports to user.
At every stage, context enters the system through a deterministic mechanism: SessionStart hooks, preloaded skills, additionalContext from PostToolUse hooks, task summaries from completed agents. Nothing depends on the agent remembering an instruction from 30 minutes ago.
The Three Hook Types
Hooks are not limited to shell scripts. Claude Code supports 3 hook types, each suited to different validation needs.

Figure 11 - Three Hook Types: From Deterministic to Intelligent: Command hooks are fast and deterministic. Prompt-based hooks add LLM judgment for semantic analysis. Agent-based hooks provide thorough multi-step verification. Use deterministic hooks for safety boundaries. Use intelligent hooks for quality assessment.
-
Command hooks: A shell command runs, reads stdin, writes to stdout. Deterministic, fast, predictable. Use for linting, safety gates, format checking.
-
Prompt-based hooks: Sends event context to a Claude model for single-turn evaluation. Returns ok/not-ok with a reason. Use when validation requires understanding intent or semantics.
-
Agent-based hooks: Spawns a full Claude agent with tool access for multi-turn verification. The agent can read files, run commands, and investigate. Use for comprehensive completion gates.
{ "Stop": [{ "hooks": [{ "type": "agent", "prompt": "Verify all unit tests pass. Run the test suite and check results.", "timeout": 120 }] }]}Critical safety rule: Never use prompt-based or agent-based hooks for hard safety boundaries. LLM evaluation is probabilistic. For blocking destructive commands or enforcing file ownership, use deterministic command hooks. Reserve intelligent hooks for quality assessment where judgment adds value.
The Builder/Validator Pattern
A clean separation of concerns uses two team agents: a builder that implements and a validator that reviews.

Figure 12 - The Builder/Validator Pattern: The builder implements with all tools and hook-enforced linting. The validator reviews with read-only tools and literally cannot modify files. The orchestrator chains them: build first, validate second. Both agents’ constraints are enforced deterministically by the system, not by prompt instructions.
The builder carries PostToolUse hooks that run ruff_validator.py and ty_validator.py after every Write or Edit. If either finds errors, the feedback flows back as additionalContext and the builder self-corrects.
The validator has disallowedTools: Write, Edit, NotebookEdit. It physically cannot modify files. It can only inspect (Read, Glob, Grep) and run commands (Bash). Its job is to verify the builder’s work meets acceptance criteria.
The orchestrator chains them: assign the builder a task, wait for completion, then assign the validator to review. If the validator reports failure, the orchestrator can reassign the task to the builder with the validator’s feedback.
The CS analogy: This is the code review process, automated. The builder is the developer writing code. The validator is the reviewer who can comment but cannot push to the branch. The hooks are the CI pipeline that runs automatically on every commit.
The Completion Gate
The Stop hook fires when an agent tries to finish. It can block the agent from stopping if completion criteria are not met. This prevents the common failure mode of agents declaring victory before all work is done.

Figure 13 - The Completion Gate: When the agent tries to stop, the Stop hook checks completion criteria. If any check fails, the agent is blocked and must continue working. Only when all criteria pass can the agent finish. This is your release gate for autonomous agents.
Agent-based Stop hooks are particularly powerful because the evaluator agent can read files, run commands, and perform multi-step analysis:
{ "Stop": [{ "hooks": [{ "type": "agent", "prompt": "Review this agent's complete output. Verify all tasks on the task list are marked complete. Run the test suite. Block completion if anything is missing or failing.", "tools": "Read, Bash" }] }]}The CS analogy: The Stop hook is a release gate. In a CI/CD pipeline, the release gate checks that all tests pass, all linting is clean, and the deployment checklist is complete before the release proceeds. The Stop hook does the same for agent task completion.
The Full Architecture

Figure 14 - The Complete Architecture: Skills define knowledge. Hooks enforce behavior. The orchestrator coordinates work. Context flows through all of it deterministically: from session start (environment awareness), through skill loading (domain knowledge), the hook loop (quality feedback), per-agent validation (specialized checks), and completion gates (release verification). Together, they make autonomous agent teams safe, reliable, and debuggable.
The complete system has 4 deterministic control layers working together:
- Safety (PreToolUse): Block destructive actions, enforce file ownership, validate commands before execution.
- Quality (PostToolUse): Run linters, type checkers, and tests after every code change. Inject feedback as additionalContext for self-correction.
- Observability (All events): Forward every hook event to a monitoring dashboard. When agent teams misbehave, the event log is your primary debugging tool.
- Completion (Stop): Prevent agents from finishing until all criteria are verified. The release gate for autonomous work.
Skills sit inside each agent, providing the domain knowledge that guides how the agent approaches tasks. Hooks wrap around each agent, enforcing the constraints that guarantee what must happen. The orchestrator sits above, coordinating who does what and when.
Autonomous agent teams need three types of control operating simultaneously. Skills provide knowledge (what to do). Hooks enforce behavior (what must happen). The orchestrator provides coordination (who does what). No single mechanism is sufficient alone. Skills without hooks lead to unreliable execution. Hooks without skills lead to capable enforcement but aimless agents. Both without an orchestrator lead to uncoordinated effort.
The Series
This is Part 1 of a 6-part series on Claude Code:
- Orchestrating AI Agent Teams (this article) — The control layer architecture that makes autonomous coding reliable
- Building Effective Claude Code Agents — Agent definitions, tool restrictions, and least privilege
- Claude Code Skills — Progressive disclosure and reusable knowledge packages
- Claude Code Hooks — PreToolUse, PostToolUse, and deterministic enforcement
- Claude Code Agent Teams — Multi-agent coordination and file ownership
- Claude Code Security — Defense-in-depth with agents, skills, hooks, commands, and teams
References
[1] Disler, “Claude Code Hooks Mastery,” GitHub Repository, 2025. https://github.com/disler/claude-code-hooks-mastery
[2] Anthropic, “Automate workflows with hooks,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/hooks-guide
[3] Anthropic, “Create custom subagents,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/sub-agents
[4] Anthropic, “Extend Claude with skills,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/skills
[5] Anthropic, “Orchestrate teams of Claude Code sessions,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/agent-teams
[6] Anthropic, “Skill authoring best practices,” Claude Platform Documentation, 2025. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices
[7] Disler, “Agentic Finance Review,” GitHub Repository, 2025. https://github.com/disler/agentic-finance-review
[8] J. Young et al., “Effective harnesses for long-running agents,” Anthropic Engineering Blog, Nov 2025. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents