An Agent Swarm That Builds Agent Swarms
We used Claude Code to build a framework that generates Claude Code infrastructure for any project. Then we proved it works by migrating two production apps — the second more complex but completed faster. 67 Python files, an AI/ML pipeline, a full architectural redesign, 8 sessions, zero framework build time.

Figure 1 - The Three-Folder Architecture: The framework reads the source project but never modifies it. All generated infrastructure lands in a fresh target project. This READ-ONLY invariant held across 10 sessions and was never violated.
A user types “top 10 customers in revenue for the last 3 years” into a text box. Thirty seconds later, a four-panel dashboard appears: a data table with revenue breakdowns, a geographic map of customer locations, a bar chart comparing revenue, and a donut chart showing distribution. All auto-generated. All correct.
This is textToSql-metabase, a full-stack application — React frontend, FastAPI backend, Metabase BI, Qdrant vector search, MS SQL Server — that was migrated from an existing codebase in 10 sessions using a framework we built entirely within Claude Code. The migrated version ran on its first live test. No emergency patches. No missing imports. Both the text-to-SQL pipeline and the help chat system worked immediately.
But the real story is not the migrated app. The real story is the framework that made the migration possible — and what it means for how developers can adopt Claude Code’s advanced features.
The Problem: Claude Code’s Best Features Are Its Least Used
Claude Code ships with powerful infrastructure: agent definitions that create specialized AI teammates, hooks that enforce quality automatically, skills that package domain knowledge for 140x token efficiency, and slash commands that encode complex workflows into single invocations.
Most developers use none of it.
The setup cost is the barrier. Configuring a proper Claude Code project — with CLAUDE.md, agent teams, hooks, skills, slash commands, init scripts, and settings.json — takes a full day for an expert. Most people write a basic CLAUDE.md and stop. They get maybe 20% of Claude Code’s potential.

Figure 2 - The Infrastructure Gap: The most powerful Claude Code features have the lowest adoption. Agent definitions, hooks, and skills require significant upfront investment to configure correctly. The Bootstrap Framework exists to close this gap.
Migration is even harder. An existing codebase has established patterns, implicit conventions, domain knowledge buried in code, and toolchain assumptions. Extracting all of that into Claude Code infrastructure — while also improving the codebase during migration — is a task most teams never attempt.
We asked: what if Claude Code agents could do this work themselves?
The Insight: Use Claude Code to Generate Claude Code
The Bootstrap & Migrator Framework is a meta-project: Claude Code agents that generate Claude Code agent infrastructure. You give it a project description (greenfield mode) or point it at an existing codebase (migration mode), and it produces a complete .claude/ configuration tailored to that specific project.
The framework contains no project-specific code. It contains knowledge about how to build Claude Code configurations:
- 8 reusable skills teaching agent design, hook engineering, skill authoring, migration strategy, project analysis, validation, harness design, and project-type patterns
- 6 slash commands guiding users through analyze, scaffold, generate-agents, generate-hooks, generate-skills, and validate workflows
- 10 templates for hook scripts, dev startup scripts, and project scaffolding
- 808 lines of methodology documenting a 7-step process refined through real use
KEY INSIGHT: The most effective way to drive adoption of advanced tooling features is to build tools that generate the configuration. Instead of teaching every developer how to write agent definitions, teach one framework how to generate them.
The Architecture: Three Folders, Strict Boundaries
The framework enforces a three-folder architecture that prevents the most common migration mistakes:
- Framework repo (
claude-teams-project-framework/) — reusable knowledge, templates, methodology. Project-agnostic. Never contains project-specific code. - Source project (
metabase-server/) — the existing codebase. READ-ONLY. Never modified. This is the reference implementation. - Target project (
textToSql-metabase/) — built fresh with generated infrastructure. All new code lands here.

Figure 3 - The 7-Step Migration Methodology: Analysis and scaffolding are sequential, but agent, hook, and skill generation run in parallel since they target different directories. The entire infrastructure generation pipeline completes in 20-40 minutes.
The READ-ONLY invariant on the source project is not just a convention — it is the single most important architectural decision. It means:
- The old project keeps running in production throughout the migration
- Every comparison between old and new behavior has a stable reference point
- No accidental modifications can break the source while you’re building the target
- The psychological clarity is as valuable as the technical enforcement: agents that know they cannot write to a directory don’t waste tokens attempting it
What the Framework Actually Generated
For the metabase-server migration, the framework analyzed the existing codebase and produced:
4 Specialized Agents
Each agent owns specific directories, runs specific model tiers, and has embedded hooks:
| Agent | Owns | Model | Focus |
|---|---|---|---|
| sql-pipeline-dev | sql_agent/, llm/, help_agent/ | Opus | Schema retrieval, SQL generation, validation, error recovery |
| dashboard-dev | designer_agent/, metabase/, cache/ | Sonnet | Dashboard creation, viz selection, Metabase API |
| frontend-dev | frontend/src/ | Sonnet | React components, shadcn/ui, Metabase SDK embedding |
| test-quality-reviewer | tests/, cross-cutting | Opus | Test creation, code quality, type annotations |
The agent definitions are not just role descriptions. They include constraints extracted from reading the source code: “All SQL queries must include tenant_id filtering”, “Target views only, not raw tables”, “Use BaseLLMClient, never call Anthropic directly.” When the sql-pipeline-dev agent considers writing a new query, the tenant_id constraint is present in its role definition, guiding every decision without repeated prompting. These constraints prevent agents from reinventing patterns that the codebase has already established. For a deep dive into agent definition anatomy, tool restrictions, and role patterns, see the companion article on Building Effective Claude Code Agents.

Figure 4 - Agent Team Architecture: Each agent has exclusive file ownership, preventing conflicts. The test-quality-reviewer has cross-cutting visibility — it can read any file but only writes to tests/. Model selection is intentional: Opus for complex algorithmic work (SQL generation, test design), Sonnet for standard implementation (frontend, dashboard). For the mechanics of how multiple agents coordinate — shared task lists, the lead-teammate architecture, and inter-agent messaging — see the companion article on Claude Code Agent Teams.
7 Domain Skills (1,486 Lines of Extracted Knowledge)
Skills are the framework’s most distinctive feature. Each skill packages domain knowledge using progressive disclosure — a three-tier loading strategy that achieves 140x token efficiency compared to loading everything into context:
- Tier 1 (always loaded): YAML frontmatter — 10-20 lines describing what the skill covers
- Tier 2 (loaded on relevance): XML body — the core patterns and rules (100-300 lines)
- Tier 3 (loaded on specific need): Reference files — deep-dive documentation
The 7 skills extracted from the metabase-server codebase:
| Skill | What It Captures |
|---|---|
| text-to-sql-pipeline | 7-step query workflow, view-centric SQL, tenant_id enforcement, error recovery |
| metabase-api-patterns | Client library, dashboard builder, card types, embedding SDK |
| multi-tenant-security | tenant_id UUID enforcement, SQL injection prevention |
| qdrant-schema-retrieval | Vector collections, embedding model, two-stage retrieval |
| llm-client-abstraction | BaseLLMClient, provider pattern, model switching |
| frontend-architecture | Component hierarchy, React Query patterns, theme sync |
| erp-domain-knowledge | View catalog, data file inventory |
These skills mean that any agent working on the migrated project can make correct domain decisions without re-reading the entire source codebase. The text-to-sql-pipeline skill, for example, contains the exact 7-step workflow the original project uses: schema retrieval, SQL generation, validation, execution, error recovery, visualization selection, and dashboard creation. An agent porting code follows the established pattern rather than inventing a new one.
KEY INSIGHT: Domain knowledge extraction is the highest-ROI step in any migration. 1,486 lines of skills replaced the need for agents to repeatedly read and understand a 37-file codebase. Every dollar spent extracting knowledge into skills saves ten dollars in reduced token consumption and fewer mistakes.
To make the 140x efficiency concrete, consider the text-to-sql-pipeline skill — the largest and most critical of the seven:
- Tier 1 (always loaded): YAML frontmatter — the skill name and a 3-line description of when to use it. Approximately 15 lines, present in every agent’s context at all times.
- Tier 2 (loaded on relevance): The SKILL.md body — the 7-step query workflow, view-centric SQL patterns, tenant_id enforcement rules, and error recovery strategy. Approximately 200 lines, loaded only when an agent works on SQL pipeline tasks.
- Tier 3 (loaded on specific need): Reference files — the complete view catalog, data file inventory, and T-SQL dialect guide (including hard-won lessons like using
TOPinstead ofLIMITfor SQL Server). Over 800 lines across multiple files, loaded individually only when an agent needs that specific detail.
At startup, every agent pays the cost of Tier 1: roughly 15 lines per skill, 105 lines total across all 7 skills. An agent working on the SQL pipeline loads Tier 2 for that one skill and perhaps one Tier 3 reference file — about 700 lines total. Without progressive disclosure, the agent would need the full content of all 7 skills loaded simultaneously: over 2,100 lines of skill bodies plus reference files. That is the 140x token efficiency in practice — 15 lines always loaded versus the full 2,100+ available on demand [10]. For a complete treatment of skill architecture and the three-tier loading system, see the companion article on Claude Code Skills.
6 Hook Configurations (Deterministic Quality Gates)
Hooks are the deterministic control layer — they run on every tool use, not when an agent remembers to. The generated configuration includes:
PreToolUse (before writes):
- Block modifications to
data/*.json(reference data, read-only) - Block destructive bash commands
PostToolUse (after writes):
- Run
ruff check --fixon every Python file edit - Run
tsc --noEmiton every TypeScript file edit - Run
eslinton every TypeScript file edit
Stop (before session ends):
- Run the full
pytestsuite — the session cannot complete with failing tests

Figure 5 - The Hook Control Layer: Hooks enforce quality deterministically. An agent cannot accidentally skip linting, break type safety, or complete a session with failing tests. This is fundamentally more reliable than prompt-based instructions like “remember to run tests.”
The key lesson from real-world use: simple command-based hooks were sufficient. The framework’s templates include elaborate Python scripts for conditional hook logic, but the actual project needed only straightforward “run linter after write” and “run tests before stop” patterns. Start simple, escalate only when conditional logic is truly needed.
How Hooks Actually Work: The additionalContext Mechanism
The key to understanding hooks is the additionalContext mechanism [8]. When a PostToolUse hook runs — say, after an agent writes a Python file — the hook script receives the tool call details on stdin as JSON, performs validation, and can return a JSON response containing an additionalContext field. Claude Code injects that string directly into the agent’s conversation context, creating a feedback loop where the agent sees the validator output and self-corrects in its very next action. No human intervention required.
Here is what this looks like in practice — the actual pattern used in this migration:
#!/usr/bin/env python3"""PostToolUse hook: Run ruff on Python file writes, feed errors back to agent."""import json, sys, subprocess
def main(): event = json.loads(sys.stdin.read()) file_path = event.get("tool_input", {}).get("file_path", "") if not file_path.endswith(".py"): sys.exit(0)
result = subprocess.run( ["ruff", "check", "--fix", file_path], capture_output=True, text=True, timeout=10 ) if result.returncode != 0: print(json.dumps({ "additionalContext": ( f"ruff found issues in {file_path}:\n" f"{result.stdout}\nFix these before continuing." ) }))
if __name__ == "__main__": main()This is fundamentally different from writing “always run ruff before committing” in CLAUDE.md. A prompt instruction achieves perhaps 90% compliance — the agent usually follows it, but under context pressure or in long sessions, it skips. A hook achieves 100% compliance. That gap between “usually” and “always” is where production systems fail. Without hooks, you are essentially “vibe coding at scale” [4]. With hooks, you have reproducible, auditable agent behavior.
Claude Code supports three hook types along a spectrum from deterministic to intelligent [8]: command hooks (shell scripts, millisecond speed, fully deterministic — used for safety gates and linting), prompt-based hooks (single-turn Claude evaluation for semantic judgment), and agent-based hooks (multi-turn Claude with tool access for thorough review gates). In this migration, all hooks were command-type. We never needed the intelligent evaluation of prompt-based or agent-based hooks — simple deterministic enforcement was sufficient. For a deeper treatment of the three types and when to escalate, see the companion article on Claude Code Hooks.
One subtle but important detail: hooks in this project were not just global settings. They were embedded per-agent in the agent definition files [5]. The sql-pipeline-dev agent carried ruff validation; the frontend-dev agent carried tsc and eslint. Each agent ran only the validation relevant to its role — no wasted computation, no false positives from irrelevant checks. This per-agent embedding pattern means that as agent teams grow, hook configurations scale with them rather than becoming a noisy global catch-all.
The Migration: 40 Features, 10 Sessions, One First-Try Success
The framework generated a 40-item feature list ordered by dependency, which the agent team executed over 10 sessions:
| Phase | Work | Sessions | Key Output |
|---|---|---|---|
| Phase 1 | Build Framework Knowledge Base | 3 | 8 skills, 6 commands, 808-line methodology |
| Phase 2 | Analyze Source + Generate Infrastructure | 2 | project_analysis.json, 4 agents, 7 skills, 6 hooks |
| Phase 3 | Port & Improve All Modules | 5 | 53 Python modules, 21 test files, 223 tests |
| Phase 4 | Lessons Learned + Framework Refinement | 2 | Updated 14+ framework files, 3 new templates |
The porting order followed the dependency graph: config first (zero dependencies), then LLM clients and Metabase client (depend on config), then SQL pipeline and dashboard pipeline (depend on everything above), then the FastAPI application (imports all modules), then frontend (independent, ported in parallel).

Figure 6 - Dependency-Order Porting: Every module was testable the moment it landed because its dependencies were already ported. The frontend ran in parallel with backend porting since it has no Python dependencies.
Anti-Patterns Fixed During Migration
The framework’s “improve during migration, not after” principle meant every module got straightforward fixes as it was ported:
| Anti-Pattern | Old Project | New Project |
|---|---|---|
| Debug output | 168 print() statements | 0 (proper logging module) |
| Package management | sys.path hacks | pyproject.toml + standard imports |
| Configuration | Hardcoded values (User ID, tenant_id) | Centralized pydantic-settings |
| Type safety | No type hints | Full type annotations + mypy |
| Linting | None | ruff enforced on every file write |
| Testing | 0 test files | 21 test files, 223 unit tests |
| Response types | Mixed dicts + Pydantic | Consistent Pydantic models |
| Error handling | Bare exceptions | Classified error recovery |
KEY INSIGHT: Migration is the best time to fix anti-patterns because you’re already reading every line of code. Applying fixes during porting costs almost nothing extra. Scheduling a separate “cleanup sprint” after migration almost never happens.
The transformation is easier to appreciate visually. Click to zoom into the full architectural diagrams:

BEFORE Migration: metabase-server — The red flags are everywhere: a 1,620-line monolithic app.py, sys.path.append() hacks, zero test files, 168 print statements, hardcoded credentials, and a synchronous blocking client. Click the image for the full-resolution view.

AFTER Migration: textToSql-metabase — The monolith becomes a modular Python package with Hatchling build, proper module structure, 223 unit tests, 7 domain skills, 6 hook configurations, and a 7-step pipeline. Every anti-pattern from the BEFORE diagram is resolved.
The Gaps: What We Got Wrong
No framework survives first contact with reality unscathed. Five gaps emerged across two migrations:
1. .gitignore carryover. The scaffold step created a fresh .gitignore but didn’t carry over source-project-specific patterns. Data files that should have been ignored were committed. Fix: now an explicit migration step.
2. Dev startup scripts. The pipeline had no step for porting development orchestration — the npm run dev scripts that start backend, frontend, and Metabase simultaneously with concurrently. The migrated project couldn’t run in dev mode until we added this. Fix: new templates and migration step added.
3. .env files. The framework scaffolded .env.example with placeholders but never prompted the developer to copy actual credentials from the source project. All 223 unit tests pass (they mock external services), so this gap was invisible until the first live test. Fix: explicit manual step documented.
4. Processing pipeline completeness. Every module ported correctly. The route handler only called step 1 of a 4-step chain. In Migration 2, the YouTube processing pipeline chained transcript extraction, markdown generation, YAML frontmatter generation, and tag resolution into a final Obsidian note. The target project ported each module independently, but the route handler that orchestrated them only wired the first step. The result: markdown output with no YAML frontmatter and no tags. Three steps existed as dead code, never called. The root cause was that the project analysis identified modules and endpoints but not processing pipelines. Modules are nouns. Pipelines are verbs. Without an explicit pipeline inventory, the porting agent treated each module as independent. Fix: processing pipelines are now first-class entities in project_analysis.json, with mandatory detection during analysis and mandatory verification during cutover.
5. Dev startup scripts — missed twice. Despite being documented as Gap #2 above, the second migration hit the exact same failure. The documentation existed. The checklist mentioned it. But the verification step never tested whether the project could actually start. Documenting a lesson is not the same as enforcing it. Fix: dev orchestration is now a MANDATORY blocking check in the verification checklist, not an advisory recommendation. The gap had to recur before we learned to make it impossible rather than merely documented.

Figure 7 - Discovery Timeline: Every gap was discovered during active use and immediately fed back into the framework. This is the core feedback loop: the framework improves through every project it touches.
Migration 2: The Framework Proves Itself
The metabase-server migration proved the concept. The second migration proved the framework compounds.
We pointed the framework at obsidian_notes — a YouTube-to-Obsidian AI pipeline built with FastAPI, React 19, PostgreSQL, Qdrant vector search, Anthropic Claude, and OpenAI embeddings. 67 Python source files, a 2,800-line monolithic API, 8 independent database connections, and an Anthropic Batch API processing system that took 4+ hours to handle a single set of videos.
The framework analyzed it and generated 5 specialized agents, 7 domain skills, 5 hooks, and a 167-line CLAUDE.md. Phase 1 — building the framework knowledge base — took zero sessions. The 3-session investment from Migration 1 paid for itself immediately.
| Dimension | Migration 1: metabase-server | Migration 2: obsidian-notes |
|---|---|---|
| Total sessions | 10 | 8 |
| Framework build (Phase 1) | 3 sessions | 0 (reused) |
| Source complexity | 45 Python files | 67 Python files + AI/ML |
| Architectural redesign | None (port + improve) | Batch API replaced entirely |
| Generated agents | 4 | 5 |
| Generated domain skills | 7 (1,486 lines) | 7 (domain-specific) |
| Anti-patterns fixed | 7 categories | 7+ categories |
| Tests created | 223 | 118 |
| New framework gaps | 6 | 2 |
The most dramatic change was not a port. It was a complete architectural replacement. The source project used Anthropic’s Batch API for processing YouTube videos into Obsidian notes. The Batch API was unreliable: 4+ hour waits with no per-item progress, no cancellation support, and opaque failures that required resubmitting entire batches. Rather than porting this broken architecture, we replaced it with asyncio.TaskGroup parallel processing. The result: seconds per video instead of hours per batch, per-item WebSocket progress updates, individual cancellation, and configurable concurrency.
KEY INSIGHT: Migration is not just an opportunity to fix code. It is an opportunity to fix architecture. When a component is demonstrably failing (not suboptimal, but failing), redesign it during migration rather than porting the failure and planning a future rewrite that never happens.
The structural improvements went further than Migration 1. The 2,800-line monolithic main.py became 7 focused router modules and a 70-line entry point. 8 independent psycopg2.connect() calls became a centralized connection pool with get_connection() as a context manager. 7 accumulated SQL migration scripts consolidated into 1 idempotent initial schema. Python typing modernized throughout (X | None instead of Optional[X], list[X] instead of List[X]), enum patterns standardized (StrEnum instead of str, Enum), and deprecated stdlib APIs updated (datetime.now(tz=UTC) instead of datetime.utcnow()).

Figure 8 - The Compound Returns: Migration 2 was more complex (67 files vs 45, AI/ML integration, full architectural redesign) but completed in fewer sessions. The 3-session framework investment from Migration 1 dropped to zero on reuse. The framework gets faster with every project it touches.
The same visual treatment shows Migration 2’s transformation. Click to zoom:

BEFORE Migration: obsidian_notes — 67 Python files, 8 independent database connections, a 2,800-line monolithic API, and the Anthropic Batch API bottleneck that took 4+ hours per processing run.

AFTER Migration: obsidian-youtube-agent — The Batch API replaced with asyncio.TaskGroup parallel processing (seconds instead of hours), 8 database connections consolidated into a centralized pool, and full Claude Code infrastructure: 5 agents, 7 skills, 5 hooks.
The hook pattern from Migration 1 held up without changes. Simple command hooks — ruff after Python writes, tsc after TypeScript writes, pytest before session end — were sufficient again. No script-based hooks needed. Hook simplicity is now confirmed across two independent projects with different tech stacks.
Why This Matters: The Infrastructure Multiplier
The Bootstrap Framework exists to solve a specific problem: Claude Code’s most powerful features have a high setup cost that prevents adoption. By encoding the knowledge of how to configure these features into reusable skills and automated workflows, the framework turns a full-day expert task into a 20-40 minute generation pipeline.
But the multiplier effect goes beyond setup time:
For the migrated project, every future session benefits from the generated infrastructure. Hooks catch bugs automatically. Skills give agents domain context without token-heavy re-reading. Agent ownership prevents file conflicts. The 157-line CLAUDE.md orients every new session in seconds.
For the framework, every migration makes it smarter. The metabase-server migration produced 3 new templates, updated 14+ framework files, and documented 6 lessons learned. The obsidian-notes migration discovered 2 more gaps and added processing pipeline detection and mandatory verification checks. Each migration makes the next one faster because these patterns are now encoded.
For Claude Code adoption broadly, the framework demonstrates that the answer to “these features are too complex to set up” is not “make the features simpler”, but rather “build tools that generate the configuration.” The features are powerful because they’re detailed. A hook that runs ruff check --fix on every Python file write is powerful precisely because it’s specific. The Bootstrap Framework makes that specificity achievable without the manual cost.

Figure 9 - Before and After: The quantitative improvement across every dimension. The most striking metric: 168 print() statements reduced to 0 while simultaneously adding 223 unit tests. Quality infrastructure compounds.
The Real Payoff: A System That Can Now Improve Itself
The migration was not the destination. It was the foundation.
The old metabase-server project could not improve itself. When a SQL query failed, the error vanished into a print() statement. When a pattern didn’t work, there was no place to record what went wrong. When a developer, human or AI, started a new session, they had no orientation, no domain context, no safety rails. Every session started from zero.
The migrated project is fundamentally different. It has the infrastructure for continuous self-improvement through feedback loops that the old project could never support.
Feedback Loop 1: Hooks Catch Failures Before They Ship
Consider what happens now when an agent writes a SQL generator that forgets tenant_id filtering:
- The agent writes the code
- PostToolUse hook runs
ruff check --fixand catches syntax issues immediately - The agent finishes the session
- Stop hook runs
pytest tests/ -xand the multi-tenant security tests fail because tenant_id is missing - The session cannot complete until the tests pass
- The agent reads the test failure, understands the tenant_id requirement, and fixes the code
In the old project, that missing tenant_id would have reached production. There were no tests to catch it, no hooks to enforce it, and no skill to remind the developer it was required.

Figure 10 - The Self-Improvement Feedback Loop: Hooks create a tight cycle where failures are caught and fixed within the same session. Skills provide the longer-term memory and patterns learned from failures persist across sessions.
Feedback Loop 2: Skills Accumulate Domain Knowledge
The text-to-sql-pipeline skill already captures the 7-step workflow, the view-centric SQL pattern, and the tenant_id enforcement rule. But skills are not static documents. When a query fails in a new way,for example when a user asks for “revenue last 3 years” and the generated SQL uses LIMIT instead of SQL Server’s TOP syntax, that pattern gets added to the skill:
T-SQL dialect: All queries use SQL Server syntax. TOP N instead of LIMIT,GETDATE() and DATEADD() for dates, ISNULL() instead of COALESCE().That line exists in the skill today because this exact failure happened during development. Every future session, every future agent, loads that knowledge automatically. The mistake cannot repeat.
In the old project, this kind of pattern lived in one developer’s memory. A different developer or AI agent working on the SQL pipeline tomorrow would make the same LIMIT vs TOP mistake and have to rediscover the fix from scratch.
Feedback Loop 3: Agent Specialization Deepens Over Time
The sql-pipeline-dev agent definition includes constraints extracted from reading the source codebase: “target views only, not raw tables”, “use BaseLLMClient”, “mock all LLM calls in tests.” These constraints grow over time. When the team discovers that certain views require a specific JOIN order for performance, that constraint gets added to the agent definition. The agent becomes more specialized with every session.
The test-quality-reviewer enables a powerful cross-cutting pattern. It has visibility across all modules. It can see test failures across the entire codebase, identify recurring patterns, and propose skill updates or constraint additions. It functions as the system’s quality feedback channel; the agent that watches other agents’ work and captures lessons.
Feedback Loop 4: The Framework Itself Improves
This is the meta-level loop. The Bootstrap Framework that generated this project’s infrastructure has its own feedback cycle. Every gap discovered during migration feeds back into the framework’s skills and templates. The .gitignore carryover gap? Now documented in the migration-strategy skill. The .env oversight? Now an explicit pipeline step. The dev startup scripts? Now a reusable template.
The next project migrated with this framework starts with all of those lessons pre-loaded. The framework gets smarter with every project it touches.
KEY INSIGHT: The migration’s true value was not the cleaner code — it was creating the infrastructure for compound improvement. Hooks enforce quality in the current session. Skills accumulate knowledge across sessions. Agent definitions deepen specialization over time. Each layer operates at a different time horizon, and together they create a system that gets better the more it is used.
Remember that 30-second dashboard from the opening? It only works because the infrastructure behind it was generated in 40 minutes by agents that understood the domain context from skills, enforced quality through hooks, and operated within clear ownership boundaries. The dashboard is the visible output. The infrastructure is the invisible multiplier.
What This Unlocks
With all this infrastructure in place, the project can now support capabilities that were impossible before:
| Capability | Required Infrastructure | Old Project | New Project |
|---|---|---|---|
| Auto-fix linting on every edit | PostToolUse hooks | Impossible | Working |
| Block test regressions | Stop hooks + test suite | Impossible (no tests) | Working |
| Protect read-only data files | PreToolUse hooks | Impossible (no hooks) | Working |
| Domain-aware SQL generation | Skills with view catalog | Partial (in developer’s head) | Encoded in skill |
| Multi-agent parallel work | Agent ownership boundaries | Impossible (no agents) | 4 agents, clear ownership |
| New session onboarding | CLAUDE.md + skills | Read 37 files manually | Automatic, seconds |
| Learn from SQL failures | Skills + test patterns | Print and forget | Record, test, never repeat |
| Quality without prompting | Hooks (deterministic) | “Remember to run ruff” | Runs automatically, 100% |

Figure 11 - Three Time Horizons of Improvement: The old project had no memory between sessions. The new project improves at three timescales simultaneously: hooks fix issues in minutes, skills retain knowledge across sessions, and agent definitions evolve over months.
The bottom row of that capabilities table underscores the core architectural principle introduced in the hooks section above: deterministic enforcement beats probabilistic compliance. A PostToolUse hook works 100% of the time. A prompt instruction does not. That gap compounds across every session, every agent, every file write [4].
What Comes Next
The framework is battle-tested on two real projects — a text-to-SQL dashboard and a YouTube-to-Obsidian AI pipeline. The second migration was more complex but completed faster, validating the compound returns. Two paths remain:
Greenfield mode is completely untested. The framework can theoretically generate Claude Code infrastructure from a plain-English project description. The skills and methodology support it, but no one has tried it yet. This is the biggest blind spot.
Different archetypes would validate generality. Both migrations have been Python FastAPI + React. The framework includes patterns for Node.js Express, CLI tools, and pure frontends, but these are theoretical until proven on real projects. A third migration on a different stack would confirm the framework is truly general, not just well-tuned for one archetype.
The framework’s readiness assessment rates 6 of 8 core skills as production-ready, with project-type-patterns and validation-checklist gaining real-world validation through the second migration. The “redesign during migration” pattern is documented but needs testing on a different type of redesign (database swap, auth system replacement) to confirm it generalizes beyond the Batch API case.
KEY INSIGHT: Two successful migrations — the second faster on a harder project — prove the compound returns. But both were Python FastAPI + React. Three migrations across different archetypes (Node.js, CLI, pure frontend) would prove the framework is truly general, not just well-tuned for one technology stack.
The Series
This is Part 1 of a 4-part series on Building the Bootstrap Framework:
- An Agent Swarm That Builds Agent Swarms (this article) — Case study migrating two production apps with generated Claude Code infrastructure
- From Prototype to Platform — How the framework learned from every migration and improved itself
- Securing Agentic AI — Building security-conscious agent systems with Claude Code
- WordPress to Astro — Migrating a production site with AI-assisted infrastructure
Related Reading
These companion articles from the Claude Code series provide deep dives into the primitives this framework builds on:
- The Anatomy of a Domain Skill — Progressive disclosure, skill extraction, and 140x token efficiency
- Hooks, Agents, and the Deterministic Control Layer — How hooks enforce what prompts cannot, and why agent file ownership prevents chaos
- Building Effective Claude Code Agents — Agent definition anatomy, tool restrictions, and role patterns
- Claude Code Agent Teams — Shared task lists, the lead-teammate architecture, and inter-agent messaging
References
Anthropic Engineering Blog:
[1] E. Schluntz and B. Zhang, “Building effective agents,” Anthropic Engineering Blog, Dec 2024. https://www.anthropic.com/engineering/building-effective-agents
[2] J. Young et al., “Effective harnesses for long-running agents,” Anthropic Engineering Blog, Nov 2025. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
[3] N. Carlini, “Building a C compiler with a team of parallel Claudes,” Anthropic Engineering Blog, Feb 2025. https://www.anthropic.com/engineering/building-c-compiler
Community implementations:
[4] Disler, “Claude Code Hooks Mastery,” GitHub Repository, 2025. https://github.com/disler/claude-code-hooks-mastery
[5] Disler, “Agentic Finance Review,” GitHub Repository, 2025. https://github.com/disler/agentic-finance-review
[6] Disler, “Claude Code Hooks Multi-Agent Observability,” GitHub Repository, 2025. https://github.com/disler/claude-code-hooks-multi-agent-observability
[7] A. Osmani, “Claude Code Swarms,” AddyOsmani.com, Feb 2026. https://addyosmani.com/blog/claude-code-agent-teams/
Claude Code Documentation:
[8] Anthropic, “Automate workflows with hooks,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/hooks-guide
[9] Anthropic, “Orchestrate teams of Claude Code sessions,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/agent-teams
[10] Anthropic, “Skill authoring best practices,” Claude Platform Documentation, 2025. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices
Companion articles:
[11] G. Dotzlaw, K. Dotzlaw, and R. Dotzlaw, “Building Effective Claude Code Agents: From Definition to Production,” 2026. /insights/claude-agents/
[12] G. Dotzlaw, K. Dotzlaw and R. Dotzlaw, “Claude Code Skills: Building Reusable Knowledge Packages for AI Agents,” 2026. /insights/claude-skills/
[13] G. Dotzlaw, K. Dotzlaw and R. Dotzlaw, “Claude Code Hooks: The Deterministic Control Layer for AI Agents,” 2026. /insights/claude-hooks/
[14] G. Dotzlaw, K. Dotzlaw and R. Dotzlaw, “Claude Code Agent Teams: Building Coordinated Swarms of AI Developers,” 2026. /insights/claude-teams/