5240 words
26 minutes
An Agent Swarm That Builds Agent Swarms: How We Used Claude Code to Generate Claude Code Infrastructure
Part 1 of 4 Building the Bootstrap Framework

An Agent Swarm That Builds Agent Swarms#

We used Claude Code to build a framework that generates Claude Code infrastructure for any project. Then we proved it works by migrating two production apps — the second more complex but completed faster. 67 Python files, an AI/ML pipeline, a full architectural redesign, 8 sessions, zero framework build time.

Figure 1 - Three-folder architecture diagram showing Bootstrap Framework, READ-ONLY source project, and generated target project with agents, skills, and hooks

Figure 1 - The Three-Folder Architecture: The framework reads the source project but never modifies it. All generated infrastructure lands in a fresh target project. This READ-ONLY invariant held across 10 sessions and was never violated.


A user types “top 10 customers in revenue for the last 3 years” into a text box. Thirty seconds later, a four-panel dashboard appears: a data table with revenue breakdowns, a geographic map of customer locations, a bar chart comparing revenue, and a donut chart showing distribution. All auto-generated. All correct.

This is textToSql-metabase, a full-stack application — React frontend, FastAPI backend, Metabase BI, Qdrant vector search, MS SQL Server — that was migrated from an existing codebase in 10 sessions using a framework we built entirely within Claude Code. The migrated version ran on its first live test. No emergency patches. No missing imports. Both the text-to-SQL pipeline and the help chat system worked immediately.

But the real story is not the migrated app. The real story is the framework that made the migration possible — and what it means for how developers can adopt Claude Code’s advanced features.

The Problem: Claude Code’s Best Features Are Its Least Used#

Claude Code ships with powerful infrastructure: agent definitions that create specialized AI teammates, hooks that enforce quality automatically, skills that package domain knowledge for 140x token efficiency, and slash commands that encode complex workflows into single invocations.

Most developers use none of it.

The setup cost is the barrier. Configuring a proper Claude Code project — with CLAUDE.md, agent teams, hooks, skills, slash commands, init scripts, and settings.json — takes a full day for an expert. Most people write a basic CLAUDE.md and stop. They get maybe 20% of Claude Code’s potential.

Figure 2 - Claude Code feature adoption bar chart showing 80% for CLAUDE.md dropping to under 3% for full infrastructure configuration

Figure 2 - The Infrastructure Gap: The most powerful Claude Code features have the lowest adoption. Agent definitions, hooks, and skills require significant upfront investment to configure correctly. The Bootstrap Framework exists to close this gap.

Migration is even harder. An existing codebase has established patterns, implicit conventions, domain knowledge buried in code, and toolchain assumptions. Extracting all of that into Claude Code infrastructure — while also improving the codebase during migration — is a task most teams never attempt.

We asked: what if Claude Code agents could do this work themselves?

The Insight: Use Claude Code to Generate Claude Code#

The Bootstrap & Migrator Framework is a meta-project: Claude Code agents that generate Claude Code agent infrastructure. You give it a project description (greenfield mode) or point it at an existing codebase (migration mode), and it produces a complete .claude/ configuration tailored to that specific project.

The framework contains no project-specific code. It contains knowledge about how to build Claude Code configurations:

  • 8 reusable skills teaching agent design, hook engineering, skill authoring, migration strategy, project analysis, validation, harness design, and project-type patterns
  • 6 slash commands guiding users through analyze, scaffold, generate-agents, generate-hooks, generate-skills, and validate workflows
  • 10 templates for hook scripts, dev startup scripts, and project scaffolding
  • 808 lines of methodology documenting a 7-step process refined through real use

KEY INSIGHT: The most effective way to drive adoption of advanced tooling features is to build tools that generate the configuration. Instead of teaching every developer how to write agent definitions, teach one framework how to generate them.

The Architecture: Three Folders, Strict Boundaries#

The framework enforces a three-folder architecture that prevents the most common migration mistakes:

  1. Framework repo (claude-teams-project-framework/) — reusable knowledge, templates, methodology. Project-agnostic. Never contains project-specific code.
  2. Source project (metabase-server/) — the existing codebase. READ-ONLY. Never modified. This is the reference implementation.
  3. Target project (textToSql-metabase/) — built fresh with generated infrastructure. All new code lands here.

Figure 3 - Seven-step migration methodology flow with parallel agent, hook, and skill generation completing in 20-40 minutes

Figure 3 - The 7-Step Migration Methodology: Analysis and scaffolding are sequential, but agent, hook, and skill generation run in parallel since they target different directories. The entire infrastructure generation pipeline completes in 20-40 minutes.

The READ-ONLY invariant on the source project is not just a convention — it is the single most important architectural decision. It means:

  • The old project keeps running in production throughout the migration
  • Every comparison between old and new behavior has a stable reference point
  • No accidental modifications can break the source while you’re building the target
  • The psychological clarity is as valuable as the technical enforcement: agents that know they cannot write to a directory don’t waste tokens attempting it

What the Framework Actually Generated#

For the metabase-server migration, the framework analyzed the existing codebase and produced:

4 Specialized Agents#

Each agent owns specific directories, runs specific model tiers, and has embedded hooks:

AgentOwnsModelFocus
sql-pipeline-devsql_agent/, llm/, help_agent/OpusSchema retrieval, SQL generation, validation, error recovery
dashboard-devdesigner_agent/, metabase/, cache/SonnetDashboard creation, viz selection, Metabase API
frontend-devfrontend/src/SonnetReact components, shadcn/ui, Metabase SDK embedding
test-quality-reviewertests/, cross-cuttingOpusTest creation, code quality, type annotations

The agent definitions are not just role descriptions. They include constraints extracted from reading the source code: “All SQL queries must include tenant_id filtering”, “Target views only, not raw tables”, “Use BaseLLMClient, never call Anthropic directly.” When the sql-pipeline-dev agent considers writing a new query, the tenant_id constraint is present in its role definition, guiding every decision without repeated prompting. These constraints prevent agents from reinventing patterns that the codebase has already established. For a deep dive into agent definition anatomy, tool restrictions, and role patterns, see the companion article on Building Effective Claude Code Agents.

Figure 4 - Four specialized agents with file ownership boundaries, model tier assignments, and cross-cutting test reviewer

Figure 4 - Agent Team Architecture: Each agent has exclusive file ownership, preventing conflicts. The test-quality-reviewer has cross-cutting visibility — it can read any file but only writes to tests/. Model selection is intentional: Opus for complex algorithmic work (SQL generation, test design), Sonnet for standard implementation (frontend, dashboard). For the mechanics of how multiple agents coordinate — shared task lists, the lead-teammate architecture, and inter-agent messaging — see the companion article on Claude Code Agent Teams.

7 Domain Skills (1,486 Lines of Extracted Knowledge)#

Skills are the framework’s most distinctive feature. Each skill packages domain knowledge using progressive disclosure — a three-tier loading strategy that achieves 140x token efficiency compared to loading everything into context:

  1. Tier 1 (always loaded): YAML frontmatter — 10-20 lines describing what the skill covers
  2. Tier 2 (loaded on relevance): XML body — the core patterns and rules (100-300 lines)
  3. Tier 3 (loaded on specific need): Reference files — deep-dive documentation

The 7 skills extracted from the metabase-server codebase:

SkillWhat It Captures
text-to-sql-pipeline7-step query workflow, view-centric SQL, tenant_id enforcement, error recovery
metabase-api-patternsClient library, dashboard builder, card types, embedding SDK
multi-tenant-securitytenant_id UUID enforcement, SQL injection prevention
qdrant-schema-retrievalVector collections, embedding model, two-stage retrieval
llm-client-abstractionBaseLLMClient, provider pattern, model switching
frontend-architectureComponent hierarchy, React Query patterns, theme sync
erp-domain-knowledgeView catalog, data file inventory

These skills mean that any agent working on the migrated project can make correct domain decisions without re-reading the entire source codebase. The text-to-sql-pipeline skill, for example, contains the exact 7-step workflow the original project uses: schema retrieval, SQL generation, validation, execution, error recovery, visualization selection, and dashboard creation. An agent porting code follows the established pattern rather than inventing a new one.

KEY INSIGHT: Domain knowledge extraction is the highest-ROI step in any migration. 1,486 lines of skills replaced the need for agents to repeatedly read and understand a 37-file codebase. Every dollar spent extracting knowledge into skills saves ten dollars in reduced token consumption and fewer mistakes.

To make the 140x efficiency concrete, consider the text-to-sql-pipeline skill — the largest and most critical of the seven:

  • Tier 1 (always loaded): YAML frontmatter — the skill name and a 3-line description of when to use it. Approximately 15 lines, present in every agent’s context at all times.
  • Tier 2 (loaded on relevance): The SKILL.md body — the 7-step query workflow, view-centric SQL patterns, tenant_id enforcement rules, and error recovery strategy. Approximately 200 lines, loaded only when an agent works on SQL pipeline tasks.
  • Tier 3 (loaded on specific need): Reference files — the complete view catalog, data file inventory, and T-SQL dialect guide (including hard-won lessons like using TOP instead of LIMIT for SQL Server). Over 800 lines across multiple files, loaded individually only when an agent needs that specific detail.

At startup, every agent pays the cost of Tier 1: roughly 15 lines per skill, 105 lines total across all 7 skills. An agent working on the SQL pipeline loads Tier 2 for that one skill and perhaps one Tier 3 reference file — about 700 lines total. Without progressive disclosure, the agent would need the full content of all 7 skills loaded simultaneously: over 2,100 lines of skill bodies plus reference files. That is the 140x token efficiency in practice — 15 lines always loaded versus the full 2,100+ available on demand [10]. For a complete treatment of skill architecture and the three-tier loading system, see the companion article on Claude Code Skills.

6 Hook Configurations (Deterministic Quality Gates)#

Hooks are the deterministic control layer — they run on every tool use, not when an agent remembers to. The generated configuration includes:

PreToolUse (before writes):

  • Block modifications to data/*.json (reference data, read-only)
  • Block destructive bash commands

PostToolUse (after writes):

  • Run ruff check --fix on every Python file edit
  • Run tsc --noEmit on every TypeScript file edit
  • Run eslint on every TypeScript file edit

Stop (before session ends):

  • Run the full pytest suite — the session cannot complete with failing tests

Figure 5 - Hook control flow showing PreToolUse blocking writes, PostToolUse auto-fixing with ruff, and Stop hook running pytest verification

Figure 5 - The Hook Control Layer: Hooks enforce quality deterministically. An agent cannot accidentally skip linting, break type safety, or complete a session with failing tests. This is fundamentally more reliable than prompt-based instructions like “remember to run tests.”

The key lesson from real-world use: simple command-based hooks were sufficient. The framework’s templates include elaborate Python scripts for conditional hook logic, but the actual project needed only straightforward “run linter after write” and “run tests before stop” patterns. Start simple, escalate only when conditional logic is truly needed.

How Hooks Actually Work: The additionalContext Mechanism#

The key to understanding hooks is the additionalContext mechanism [8]. When a PostToolUse hook runs — say, after an agent writes a Python file — the hook script receives the tool call details on stdin as JSON, performs validation, and can return a JSON response containing an additionalContext field. Claude Code injects that string directly into the agent’s conversation context, creating a feedback loop where the agent sees the validator output and self-corrects in its very next action. No human intervention required.

Here is what this looks like in practice — the actual pattern used in this migration:

#!/usr/bin/env python3
"""PostToolUse hook: Run ruff on Python file writes, feed errors back to agent."""
import json, sys, subprocess
def main():
event = json.loads(sys.stdin.read())
file_path = event.get("tool_input", {}).get("file_path", "")
if not file_path.endswith(".py"):
sys.exit(0)
result = subprocess.run(
["ruff", "check", "--fix", file_path],
capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
print(json.dumps({
"additionalContext": (
f"ruff found issues in {file_path}:\n"
f"{result.stdout}\nFix these before continuing."
)
}))
if __name__ == "__main__":
main()

This is fundamentally different from writing “always run ruff before committing” in CLAUDE.md. A prompt instruction achieves perhaps 90% compliance — the agent usually follows it, but under context pressure or in long sessions, it skips. A hook achieves 100% compliance. That gap between “usually” and “always” is where production systems fail. Without hooks, you are essentially “vibe coding at scale” [4]. With hooks, you have reproducible, auditable agent behavior.

Claude Code supports three hook types along a spectrum from deterministic to intelligent [8]: command hooks (shell scripts, millisecond speed, fully deterministic — used for safety gates and linting), prompt-based hooks (single-turn Claude evaluation for semantic judgment), and agent-based hooks (multi-turn Claude with tool access for thorough review gates). In this migration, all hooks were command-type. We never needed the intelligent evaluation of prompt-based or agent-based hooks — simple deterministic enforcement was sufficient. For a deeper treatment of the three types and when to escalate, see the companion article on Claude Code Hooks.

One subtle but important detail: hooks in this project were not just global settings. They were embedded per-agent in the agent definition files [5]. The sql-pipeline-dev agent carried ruff validation; the frontend-dev agent carried tsc and eslint. Each agent ran only the validation relevant to its role — no wasted computation, no false positives from irrelevant checks. This per-agent embedding pattern means that as agent teams grow, hook configurations scale with them rather than becoming a noisy global catch-all.

The Migration: 40 Features, 10 Sessions, One First-Try Success#

The framework generated a 40-item feature list ordered by dependency, which the agent team executed over 10 sessions:

PhaseWorkSessionsKey Output
Phase 1Build Framework Knowledge Base38 skills, 6 commands, 808-line methodology
Phase 2Analyze Source + Generate Infrastructure2project_analysis.json, 4 agents, 7 skills, 6 hooks
Phase 3Port & Improve All Modules553 Python modules, 21 test files, 223 tests
Phase 4Lessons Learned + Framework Refinement2Updated 14+ framework files, 3 new templates

The porting order followed the dependency graph: config first (zero dependencies), then LLM clients and Metabase client (depend on config), then SQL pipeline and dashboard pipeline (depend on everything above), then the FastAPI application (imports all modules), then frontend (independent, ported in parallel).

Figure 6 - Dependency-order porting graph from config through shared libraries to application routes with parallel frontend track

Figure 6 - Dependency-Order Porting: Every module was testable the moment it landed because its dependencies were already ported. The frontend ran in parallel with backend porting since it has no Python dependencies.

Anti-Patterns Fixed During Migration#

The framework’s “improve during migration, not after” principle meant every module got straightforward fixes as it was ported:

Anti-PatternOld ProjectNew Project
Debug output168 print() statements0 (proper logging module)
Package managementsys.path hackspyproject.toml + standard imports
ConfigurationHardcoded values (User ID, tenant_id)Centralized pydantic-settings
Type safetyNo type hintsFull type annotations + mypy
LintingNoneruff enforced on every file write
Testing0 test files21 test files, 223 unit tests
Response typesMixed dicts + PydanticConsistent Pydantic models
Error handlingBare exceptionsClassified error recovery

KEY INSIGHT: Migration is the best time to fix anti-patterns because you’re already reading every line of code. Applying fixes during porting costs almost nothing extra. Scheduling a separate “cleanup sprint” after migration almost never happens.

The transformation is easier to appreciate visually. Click to zoom into the full architectural diagrams:

Infographic - BEFORE Migration: metabase-server — showing the monolithic 1,620-line app.py, spaghetti dependencies, CORS wildcard, 168 print statements, zero tests, and sys.path hacks

BEFORE Migration: metabase-server — The red flags are everywhere: a 1,620-line monolithic app.py, sys.path.append() hacks, zero test files, 168 print statements, hardcoded credentials, and a synchronous blocking client. Click the image for the full-resolution view.

Infographic - AFTER Migration: textToSql-metabase — showing modular architecture with 7 agent directories, proper package structure, 223 tests, 7 skills, and Claude Code infrastructure

AFTER Migration: textToSql-metabase — The monolith becomes a modular Python package with Hatchling build, proper module structure, 223 unit tests, 7 domain skills, 6 hook configurations, and a 7-step pipeline. Every anti-pattern from the BEFORE diagram is resolved.

The Gaps: What We Got Wrong#

No framework survives first contact with reality unscathed. Five gaps emerged across two migrations:

1. .gitignore carryover. The scaffold step created a fresh .gitignore but didn’t carry over source-project-specific patterns. Data files that should have been ignored were committed. Fix: now an explicit migration step.

2. Dev startup scripts. The pipeline had no step for porting development orchestration — the npm run dev scripts that start backend, frontend, and Metabase simultaneously with concurrently. The migrated project couldn’t run in dev mode until we added this. Fix: new templates and migration step added.

3. .env files. The framework scaffolded .env.example with placeholders but never prompted the developer to copy actual credentials from the source project. All 223 unit tests pass (they mock external services), so this gap was invisible until the first live test. Fix: explicit manual step documented.

4. Processing pipeline completeness. Every module ported correctly. The route handler only called step 1 of a 4-step chain. In Migration 2, the YouTube processing pipeline chained transcript extraction, markdown generation, YAML frontmatter generation, and tag resolution into a final Obsidian note. The target project ported each module independently, but the route handler that orchestrated them only wired the first step. The result: markdown output with no YAML frontmatter and no tags. Three steps existed as dead code, never called. The root cause was that the project analysis identified modules and endpoints but not processing pipelines. Modules are nouns. Pipelines are verbs. Without an explicit pipeline inventory, the porting agent treated each module as independent. Fix: processing pipelines are now first-class entities in project_analysis.json, with mandatory detection during analysis and mandatory verification during cutover.

5. Dev startup scripts — missed twice. Despite being documented as Gap #2 above, the second migration hit the exact same failure. The documentation existed. The checklist mentioned it. But the verification step never tested whether the project could actually start. Documenting a lesson is not the same as enforcing it. Fix: dev orchestration is now a MANDATORY blocking check in the verification checklist, not an advisory recommendation. The gap had to recur before we learned to make it impossible rather than merely documented.

Figure 7 - Ten-session migration timeline with framework building, infrastructure generation, module porting, and gap discovery feedback loops

Figure 7 - Discovery Timeline: Every gap was discovered during active use and immediately fed back into the framework. This is the core feedback loop: the framework improves through every project it touches.

Migration 2: The Framework Proves Itself#

The metabase-server migration proved the concept. The second migration proved the framework compounds.

We pointed the framework at obsidian_notes — a YouTube-to-Obsidian AI pipeline built with FastAPI, React 19, PostgreSQL, Qdrant vector search, Anthropic Claude, and OpenAI embeddings. 67 Python source files, a 2,800-line monolithic API, 8 independent database connections, and an Anthropic Batch API processing system that took 4+ hours to handle a single set of videos.

The framework analyzed it and generated 5 specialized agents, 7 domain skills, 5 hooks, and a 167-line CLAUDE.md. Phase 1 — building the framework knowledge base — took zero sessions. The 3-session investment from Migration 1 paid for itself immediately.

DimensionMigration 1: metabase-serverMigration 2: obsidian-notes
Total sessions108
Framework build (Phase 1)3 sessions0 (reused)
Source complexity45 Python files67 Python files + AI/ML
Architectural redesignNone (port + improve)Batch API replaced entirely
Generated agents45
Generated domain skills7 (1,486 lines)7 (domain-specific)
Anti-patterns fixed7 categories7+ categories
Tests created223118
New framework gaps62

The most dramatic change was not a port. It was a complete architectural replacement. The source project used Anthropic’s Batch API for processing YouTube videos into Obsidian notes. The Batch API was unreliable: 4+ hour waits with no per-item progress, no cancellation support, and opaque failures that required resubmitting entire batches. Rather than porting this broken architecture, we replaced it with asyncio.TaskGroup parallel processing. The result: seconds per video instead of hours per batch, per-item WebSocket progress updates, individual cancellation, and configurable concurrency.

KEY INSIGHT: Migration is not just an opportunity to fix code. It is an opportunity to fix architecture. When a component is demonstrably failing (not suboptimal, but failing), redesign it during migration rather than porting the failure and planning a future rewrite that never happens.

The structural improvements went further than Migration 1. The 2,800-line monolithic main.py became 7 focused router modules and a 70-line entry point. 8 independent psycopg2.connect() calls became a centralized connection pool with get_connection() as a context manager. 7 accumulated SQL migration scripts consolidated into 1 idempotent initial schema. Python typing modernized throughout (X | None instead of Optional[X], list[X] instead of List[X]), enum patterns standardized (StrEnum instead of str, Enum), and deprecated stdlib APIs updated (datetime.now(tz=UTC) instead of datetime.utcnow()).

Figure 8 - Side-by-side migration comparison showing Migration 1 at 45 files in 10 sessions versus Migration 2 at 67 files in 8 sessions with Batch API to async redesign

Figure 8 - The Compound Returns: Migration 2 was more complex (67 files vs 45, AI/ML integration, full architectural redesign) but completed in fewer sessions. The 3-session framework investment from Migration 1 dropped to zero on reuse. The framework gets faster with every project it touches.

The same visual treatment shows Migration 2’s transformation. Click to zoom:

Infographic - BEFORE Migration: obsidian_notes — showing the tangled architecture with 67 Python files, 8 independent database connections, Batch API bottleneck, and scattered AI pipeline components

BEFORE Migration: obsidian_notes — 67 Python files, 8 independent database connections, a 2,800-line monolithic API, and the Anthropic Batch API bottleneck that took 4+ hours per processing run.

Infographic - AFTER Migration: obsidian-youtube-agent — showing clean modular architecture with ThreadPoolContext, async processing, centralized connection pool, and Claude Code infrastructure

AFTER Migration: obsidian-youtube-agent — The Batch API replaced with asyncio.TaskGroup parallel processing (seconds instead of hours), 8 database connections consolidated into a centralized pool, and full Claude Code infrastructure: 5 agents, 7 skills, 5 hooks.

The hook pattern from Migration 1 held up without changes. Simple command hooks — ruff after Python writes, tsc after TypeScript writes, pytest before session end — were sufficient again. No script-based hooks needed. Hook simplicity is now confirmed across two independent projects with different tech stacks.

Why This Matters: The Infrastructure Multiplier#

The Bootstrap Framework exists to solve a specific problem: Claude Code’s most powerful features have a high setup cost that prevents adoption. By encoding the knowledge of how to configure these features into reusable skills and automated workflows, the framework turns a full-day expert task into a 20-40 minute generation pipeline.

But the multiplier effect goes beyond setup time:

For the migrated project, every future session benefits from the generated infrastructure. Hooks catch bugs automatically. Skills give agents domain context without token-heavy re-reading. Agent ownership prevents file conflicts. The 157-line CLAUDE.md orients every new session in seconds.

For the framework, every migration makes it smarter. The metabase-server migration produced 3 new templates, updated 14+ framework files, and documented 6 lessons learned. The obsidian-notes migration discovered 2 more gaps and added processing pipeline detection and mandatory verification checks. Each migration makes the next one faster because these patterns are now encoded.

For Claude Code adoption broadly, the framework demonstrates that the answer to “these features are too complex to set up” is not “make the features simpler”, but rather “build tools that generate the configuration.” The features are powerful because they’re detailed. A hook that runs ruff check --fix on every Python file write is powerful precisely because it’s specific. The Bootstrap Framework makes that specificity achievable without the manual cost.

Figure 9 - Before and after migration comparison from 37 files with 168 print statements and 0 tests to 53 modules with 223 tests and full infrastructure

Figure 9 - Before and After: The quantitative improvement across every dimension. The most striking metric: 168 print() statements reduced to 0 while simultaneously adding 223 unit tests. Quality infrastructure compounds.

The Real Payoff: A System That Can Now Improve Itself#

The migration was not the destination. It was the foundation.

The old metabase-server project could not improve itself. When a SQL query failed, the error vanished into a print() statement. When a pattern didn’t work, there was no place to record what went wrong. When a developer, human or AI, started a new session, they had no orientation, no domain context, no safety rails. Every session started from zero.

The migrated project is fundamentally different. It has the infrastructure for continuous self-improvement through feedback loops that the old project could never support.

Feedback Loop 1: Hooks Catch Failures Before They Ship#

Consider what happens now when an agent writes a SQL generator that forgets tenant_id filtering:

  1. The agent writes the code
  2. PostToolUse hook runs ruff check --fix and catches syntax issues immediately
  3. The agent finishes the session
  4. Stop hook runs pytest tests/ -x and the multi-tenant security tests fail because tenant_id is missing
  5. The session cannot complete until the tests pass
  6. The agent reads the test failure, understands the tenant_id requirement, and fixes the code

In the old project, that missing tenant_id would have reached production. There were no tests to catch it, no hooks to enforce it, and no skill to remind the developer it was required.

Figure 10 - Self-improvement feedback loop with hooks validating code, tests verifying behavior, and skills accumulating knowledge across sessions

Figure 10 - The Self-Improvement Feedback Loop: Hooks create a tight cycle where failures are caught and fixed within the same session. Skills provide the longer-term memory and patterns learned from failures persist across sessions.

Feedback Loop 2: Skills Accumulate Domain Knowledge#

The text-to-sql-pipeline skill already captures the 7-step workflow, the view-centric SQL pattern, and the tenant_id enforcement rule. But skills are not static documents. When a query fails in a new way,for example when a user asks for “revenue last 3 years” and the generated SQL uses LIMIT instead of SQL Server’s TOP syntax, that pattern gets added to the skill:

T-SQL dialect: All queries use SQL Server syntax. TOP N instead of LIMIT,
GETDATE() and DATEADD() for dates, ISNULL() instead of COALESCE().

That line exists in the skill today because this exact failure happened during development. Every future session, every future agent, loads that knowledge automatically. The mistake cannot repeat.

In the old project, this kind of pattern lived in one developer’s memory. A different developer or AI agent working on the SQL pipeline tomorrow would make the same LIMIT vs TOP mistake and have to rediscover the fix from scratch.

Feedback Loop 3: Agent Specialization Deepens Over Time#

The sql-pipeline-dev agent definition includes constraints extracted from reading the source codebase: “target views only, not raw tables”, “use BaseLLMClient”, “mock all LLM calls in tests.” These constraints grow over time. When the team discovers that certain views require a specific JOIN order for performance, that constraint gets added to the agent definition. The agent becomes more specialized with every session.

The test-quality-reviewer enables a powerful cross-cutting pattern. It has visibility across all modules. It can see test failures across the entire codebase, identify recurring patterns, and propose skill updates or constraint additions. It functions as the system’s quality feedback channel; the agent that watches other agents’ work and captures lessons.

Feedback Loop 4: The Framework Itself Improves#

This is the meta-level loop. The Bootstrap Framework that generated this project’s infrastructure has its own feedback cycle. Every gap discovered during migration feeds back into the framework’s skills and templates. The .gitignore carryover gap? Now documented in the migration-strategy skill. The .env oversight? Now an explicit pipeline step. The dev startup scripts? Now a reusable template.

The next project migrated with this framework starts with all of those lessons pre-loaded. The framework gets smarter with every project it touches.

KEY INSIGHT: The migration’s true value was not the cleaner code — it was creating the infrastructure for compound improvement. Hooks enforce quality in the current session. Skills accumulate knowledge across sessions. Agent definitions deepen specialization over time. Each layer operates at a different time horizon, and together they create a system that gets better the more it is used.

Remember that 30-second dashboard from the opening? It only works because the infrastructure behind it was generated in 40 minutes by agents that understood the domain context from skills, enforced quality through hooks, and operated within clear ownership boundaries. The dashboard is the visible output. The infrastructure is the invisible multiplier.

What This Unlocks#

With all this infrastructure in place, the project can now support capabilities that were impossible before:

CapabilityRequired InfrastructureOld ProjectNew Project
Auto-fix linting on every editPostToolUse hooksImpossibleWorking
Block test regressionsStop hooks + test suiteImpossible (no tests)Working
Protect read-only data filesPreToolUse hooksImpossible (no hooks)Working
Domain-aware SQL generationSkills with view catalogPartial (in developer’s head)Encoded in skill
Multi-agent parallel workAgent ownership boundariesImpossible (no agents)4 agents, clear ownership
New session onboardingCLAUDE.md + skillsRead 37 files manuallyAutomatic, seconds
Learn from SQL failuresSkills + test patternsPrint and forgetRecord, test, never repeat
Quality without promptingHooks (deterministic)“Remember to run ruff”Runs automatically, 100%

Figure 11 - Three time horizons of compound improvement with hooks operating in minutes, skills across sessions, and agent definitions over months

Figure 11 - Three Time Horizons of Improvement: The old project had no memory between sessions. The new project improves at three timescales simultaneously: hooks fix issues in minutes, skills retain knowledge across sessions, and agent definitions evolve over months.

The bottom row of that capabilities table underscores the core architectural principle introduced in the hooks section above: deterministic enforcement beats probabilistic compliance. A PostToolUse hook works 100% of the time. A prompt instruction does not. That gap compounds across every session, every agent, every file write [4].

What Comes Next#

The framework is battle-tested on two real projects — a text-to-SQL dashboard and a YouTube-to-Obsidian AI pipeline. The second migration was more complex but completed faster, validating the compound returns. Two paths remain:

Greenfield mode is completely untested. The framework can theoretically generate Claude Code infrastructure from a plain-English project description. The skills and methodology support it, but no one has tried it yet. This is the biggest blind spot.

Different archetypes would validate generality. Both migrations have been Python FastAPI + React. The framework includes patterns for Node.js Express, CLI tools, and pure frontends, but these are theoretical until proven on real projects. A third migration on a different stack would confirm the framework is truly general, not just well-tuned for one archetype.

The framework’s readiness assessment rates 6 of 8 core skills as production-ready, with project-type-patterns and validation-checklist gaining real-world validation through the second migration. The “redesign during migration” pattern is documented but needs testing on a different type of redesign (database swap, auth system replacement) to confirm it generalizes beyond the Batch API case.

KEY INSIGHT: Two successful migrations — the second faster on a harder project — prove the compound returns. But both were Python FastAPI + React. Three migrations across different archetypes (Node.js, CLI, pure frontend) would prove the framework is truly general, not just well-tuned for one technology stack.


The Series#

This is Part 1 of a 4-part series on Building the Bootstrap Framework:

  1. An Agent Swarm That Builds Agent Swarms (this article) — Case study migrating two production apps with generated Claude Code infrastructure
  2. From Prototype to Platform — How the framework learned from every migration and improved itself
  3. Securing Agentic AI — Building security-conscious agent systems with Claude Code
  4. WordPress to Astro — Migrating a production site with AI-assisted infrastructure

These companion articles from the Claude Code series provide deep dives into the primitives this framework builds on:


References#

Anthropic Engineering Blog:

[1] E. Schluntz and B. Zhang, “Building effective agents,” Anthropic Engineering Blog, Dec 2024. https://www.anthropic.com/engineering/building-effective-agents

[2] J. Young et al., “Effective harnesses for long-running agents,” Anthropic Engineering Blog, Nov 2025. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents

[3] N. Carlini, “Building a C compiler with a team of parallel Claudes,” Anthropic Engineering Blog, Feb 2025. https://www.anthropic.com/engineering/building-c-compiler

Community implementations:

[4] Disler, “Claude Code Hooks Mastery,” GitHub Repository, 2025. https://github.com/disler/claude-code-hooks-mastery

[5] Disler, “Agentic Finance Review,” GitHub Repository, 2025. https://github.com/disler/agentic-finance-review

[6] Disler, “Claude Code Hooks Multi-Agent Observability,” GitHub Repository, 2025. https://github.com/disler/claude-code-hooks-multi-agent-observability

[7] A. Osmani, “Claude Code Swarms,” AddyOsmani.com, Feb 2026. https://addyosmani.com/blog/claude-code-agent-teams/

Claude Code Documentation:

[8] Anthropic, “Automate workflows with hooks,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/hooks-guide

[9] Anthropic, “Orchestrate teams of Claude Code sessions,” Claude Code Documentation, 2025. https://code.claude.com/docs/en/agent-teams

[10] Anthropic, “Skill authoring best practices,” Claude Platform Documentation, 2025. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices

Companion articles:

[11] G. Dotzlaw, K. Dotzlaw, and R. Dotzlaw, “Building Effective Claude Code Agents: From Definition to Production,” 2026. /insights/claude-agents/

[12] G. Dotzlaw, K. Dotzlaw and R. Dotzlaw, “Claude Code Skills: Building Reusable Knowledge Packages for AI Agents,” 2026. /insights/claude-skills/

[13] G. Dotzlaw, K. Dotzlaw and R. Dotzlaw, “Claude Code Hooks: The Deterministic Control Layer for AI Agents,” 2026. /insights/claude-hooks/

[14] G. Dotzlaw, K. Dotzlaw and R. Dotzlaw, “Claude Code Agent Teams: Building Coordinated Swarms of AI Developers,” 2026. /insights/claude-teams/

An Agent Swarm That Builds Agent Swarms: How We Used Claude Code to Generate Claude Code Infrastructure
https://dotzlaw.com/insights/bootstrap-framework-01/
Author
Gary Dotzlaw, Katrina Dotzlaw, Ryan Dotzlaw
Published at
2026-02-11
License
CC BY-NC-SA 4.0
← Back to Insights